improvement - Page 49 - Developer IT

CodePlex Daily Summary for Saturday, November 03, 2012

CodePlex Daily Summary for Saturday, November 03, 2012Popular ReleasesZXMAK2: Version 2.6.8.2: fix save to SZX snapshot for +3 model fix +3 ULA timingsLaunchbar: Launchbar 4.2.2.0: This release is the first step in cleaning up the code and using all the latest features of .NET 4.5 Changes 4.2.2 (2012-11-02) Improved handling of left clicks 4.1.0 (2012-10-17) Removed tray icon Assembly renamed and signed with strong name Note When you upgrade, Launchbar will start with the default settings. You can import your previous settings by following these steps: Run Launchbar and just save the settings without configuring anything Shutdown Launchbar Go to the folder %LOCA...CommonLibrary.NET: CommonLibrary.NET 0.9.8.8: Releases notes for FluentScript located at http://fluentscript.codeplex.com/wikipage?title=Release%20Notes&referringTitle=Documentation Fluentscript - 0.9.8.8 - Final ReleaseApplication: FluentScript Version: 0.9.8.8 Build: 0.9.8.8 Changeset: 77368 ( CommonLibrary.NET ) Release date: November 2nd, 2012 Binaries: CommonLibrary.dll Namespace: ComLib.Lang Project site: http://fluentscript.codeplex.com/ Download: http://commonlibrarynet.codeplex.com/releases/view/90426 Source code: http://common...Mouse Jiggler: MouseJiggle-1.3: This adds the much-requested minimize-to-tray feature to Mouse Jiggler.Umbraco CMS: Umbraco 4.10.0 Release Candidate: This is a Release Candidate, which means that if we do not find any major issues in the next week, we will release this version as the final release of 4.10.0 on November 9th, 2012. The documentation for the MVC bits still lives in the Github version of the docs for now and will be updated on our.umbraco.org with the final release of 4.10.0. Browse the documentation here: https://github.com/umbraco/Umbraco4Docs/tree/4.8.0/Documentation/Reference/Mvc If you want to do only MVC then make sur...Skype Auto Recorder: SkypeAutoRecorder 1.3.4: New icon and images. Reworked settings window. Implemented high-quality sound encoding. Implemented a possibility to produce stereo records. Added buttons with system-wide hot keys for manual starting and canceling of recording. Added buttons for opening folder with records. Added Help button. Fixed an issue when recording is continuing after call end. Fixed an issue when recording doesn't start. Fixed several bugs and improved stability. Major refactoring and optimization...Access 2010 Application Platform - Build Your Own Database: Application Platform - 0.0.2: Release 0.0.2 Created two new users. One belongs to the Administrators group and the other to the Public group. User: admin Pass: admin User: guest Pass: guest Initial Release This is the first version of the database. At the moment is all contained in one file to make development easier, but the obvious idea would be to split it into Front and Back End for a production version of the tool. The features it contains at the moment are the "Core" features.Python Tools for Visual Studio: Python Tools for Visual Studio 1.5: We’re pleased to announce the release of Python Tools for Visual Studio 1.5 RTM. Python Tools for Visual Studio (PTVS) is an open-source plug-in for Visual Studio which supports programming with the Python language. PTVS supports a broad range of features including CPython/IronPython, Edit/Intellisense/Debug/Profile, Cloud, HPC, IPython, etc. support. For a quick overview of the general IDE experience, please watch this video There are a number of exciting improvement in this release comp...BCF.Net: BCF.Net: BCF.Net-20121024 source codeAssaultCube Reloaded: 2.5.5: Linux has Ubuntu 11.10 32-bit precompiled binaries and Ubuntu 10.10 64-bit precompiled binaries, but you can compile your own as it also contains the source. If you are using Mac or other operating systems, please wait while we try to package for those OSes. Try to compile it. If it fails, download a virtual machine. The server pack is ready for both Windows and Linux, but you might need to compile your own for Linux (source included) Changelog: Fixed potential bot bugs: Map change, OpenAL...Edi: Edi 1.0 with DarkExpression: Added DarkExpression theme (dialogs and message boxes are not completely themed, yet)DirectX Tool Kit: October 30, 2012 (add WP8 support): October 30, 2012 Added project files for Windows Phone 8MCEBuddy 2.x: MCEBuddy 2.3.6: Changelog for 2.3.6 (32bit and 64bit) 1. Fixed a bug in multichannel audio conversion failure. AAC does not support 6 channel audio, MCEBuddy now checks for it and force the output to 2 channel if AAC codec is specified 2. Fixed a bug in Original Broadcast Date and Time. Original Broadcast Date and Time is reported in UTC timezone in WTV metadata. TVDB and MovieDB dates are reported in network timezone. It is assumed the video is recorded and converted on the same machine, i.e. local timezone...MVVM Light Toolkit: MVVM Light Toolkit V4.1 for Visual Studio 2012: This version only supports Visual Studio 2012 (and all Express editions too). If you use Visual Studio 2010, please stay tuned, we will publish an update in a few days with support for VS10. V4.1 supports: Windows Phone 8 Windows 8 (Windows RT) Silverlight 5 Silverlight 4 WPF 4.5 WPF 4 WPF 3.5 And the following development environments: Visual Studio 2012 (Pro, Premium, Ultimate) Visual Studio 2012 Express for Windows 8 Visual Studio 2012 Express for Windows Phone 8 Visual...Microsoft Ajax Minifier: Microsoft Ajax Minifier 4.73: Fix issue in Discussion #401101 (unreferenced var in a for-in statement was getting removed). add the grouping operator to the parsed output so that unminified parsed code is closer to the original. Will still strip unneeded parens later, if minifying. more cleaning of references as they are minified out of the code.RiP-Ripper & PG-Ripper: PG-Ripper 1.4.03: changes NEW: Added Support for the phun.org forum FIXED: Kitty-Kats new Forum UrlLiberty: v3.4.0.1 Release 28th October 2012: Change Log -Fixed -H4 Fixed the save verification screen showing incorrect mission and difficulty information for some saves -H4 Hopefully fixed the issue where progress did not save between missions and saves would not revert correctly -H3 Fixed crashes that occurred when trying to load player information -Proper exception dialogs will now show in place of crashesPlayer Framework by Microsoft: Player Framework for Windows 8 (Preview 7): This release is compatible with the version of the Smooth Streaming SDK released today (10/26). Release 1 of the player framework is expected to be available next week. IMPROVEMENTS & FIXESIMPORTANT: List of breaking changes from preview 6 Support for the latest smooth streaming SDK. Xaml only: Support for moving any of the UI elements outside the MediaPlayer (e.g. into the appbar). Note: Equivelent changes to the JS version due in coming week. Support for localizing all text used in t...Send multiple SMS via Way2SMS C#: SMS 1.1: Added support for 160by2Quick Launch: Quick Launch 1.0: A Lightweight and Fast Way to Manage and Launch Thousands of Tools and ApplicationsPress Win+Q and start to search and run. http://www.codeplex.com/Download?ProjectName=quicklaunch&DownloadId=523536New Projectsappnitiren: This project's main objective is to deepen my knowledge on developing apps for windows 8 and windows phone and Buddhist philosophy of Nichiren Daishonin. Aspect - TypeScript AOP Framework: A simple AOP framework for TypeScript.AWF's PowerPoint Tab: A collection of little powerpoint utilities, including "resize all shapes".Base64 Encoder-Decoder: (Base64 Encoder/Decoder using C#)BombaJob-WF: Official Windows desktop client for BombaJob.bgBungie Test: !CCAudio: A project to create an audio recording/mixing and processing framework in C#/C++ with a look to creating a Digital Audio Workstation. EvaluationSyarem: for yuanguangexporttfschangesets: Small utility that runs agains tftp.exe(tfs power tools) and exports a given range of changesets.fjfdszjtfs: fjfdszjtfsKladionica: Ovo je mala web aplikacija koja je specijalizovana za f1 kladionicu na forumu KrstaricaListViewItem Float Test: WPF testing ground for development of a "floating" listviewitem (preeettty)Macconnell: This is my test projectNWebsec Demo site: This is the NWebsec demo website project.Ocean Flattened: Summary of this projectQuagmire: QUAGMIRE aims to make distributed data processing and visualisation simple and flexible.Scheduled SQL Jobs for DotNetNuke by IowaComputerGurus Inc.: Keeping a DotNetNuke site database clean can be a real nightmare for those hosting sites on shared hosting providers.Secretary Tool: Tool for secretaries of congregations of Jehovah's Witnesses.Secure My Install for DotNetNuke by IowaComputerGurus: This module is a single-use per DotNetNuke installation utility module that will make a number of security enhancing modifications to your "out of the box" DotNServStop: ServStop is a .NET application that makes it easy to stop several system services at once. Now you don't have to change startup types or stop them one at a time. It has a simple list-based interface with the ability to save and load lists of user services to stop. Written in C#.Simple File List for DotNetNuke by IowaComputerGurus Inc.: The Simple File List module is a Free DotNetNuke module that will list all files within a specific folder under the specific portal folder within DNN for users SOLUS3 Config Discovery: An open-source community developed tool for generating a simple text file detailing your local SIMS server settings to simplify new SOLUS3 configurations.Sort Project: Project Name: Sort Project Description: This project holds a class that extends objects type of IEnumerable by enabling them using sorting algorithms other than the buil in one within the microsoft staff SQL Azure Data Protector: This tool leverages the currently available options that Azure platform has to provide an integrated and automated solution to back up the SQL Azure databases.Throttling Suite for ASP.NET Applications: The Throttling Suite provides throttling control capabilities to the ASP.NET applications. It is highly customizable; including "log-only" mode.TidyBackups: Allows for automation of deletion of Microsoft SQL backup files based on file type (.bak) creation age as well as compression using standard ZIP technology.TrainingData: TrainingData is a set of libraries for reading and modifying training data files for Garmin and Polar heart rate monitors.Virtual eCommerce: ??? ? ?????? ????????? ??????. ??????: ??????????? ?????????, ?????????? ?????????? ?? ????????????? ?? Microsoft - ASP Web Pages with Razor Engine.WebNext: My personal web page / blog with cms on asp mvc 3Windows 8 Mouse Unsticky: Makes Windows 8 top right/left hot corners unsticky.WPForms: WPForms is simple framework for Form-driven Silverlight Windows Phone applications. Using the framework allows developers to easily create and display forms.

Read the article

Network Logon Issues with Group Policy and Network

- by bobloki

I am gravely in need of your help and assistance. We have a problem with our logon and startup to our Windows 7 Enterprise system. We have more than 3000 Windows Desktops situated in roughly 20+ buildings around campus. Almost every computer on campus has the problem that I will be describing. I have spent over one month peering over etl files from Windows Performance Analyzer (A great product) and hundreds of thousands of event logs. I come to you today humbled that I could not figure this out. The problem as simply put our logon times are extremely long. An average first time logon is roughly 2-10 minutes depending on the software installed. All computers are Windows 7, the oldest computers being 5 years old. Startup times on various computers range from good (1-2 minutes) to very bad (5-60). Our second time logons range from 30 seconds to 4 minutes. We have a gigabit connection between each computer on the network. We have 5 domain controllers which also double as our DNS servers. Initial testing led us to believe that this was a software problem. So I spent a few days testing machines only to find inconsistent results from the etl files from xperfview. Each subset of computers on campus had a different subset of software issues, none seeming to interfere with logon just startup. So I started looking at our group policy and located some very interesting event ID’s. Group Policy 1129: The processing of Group Policy failed because of lack of network connectivity to a domain controller. Group Policy 1055: The processing of Group Policy failed. Windows could not resolve the computer name. This could be caused by one of more of the following: a) Name Resolution failure on the current domain controller. b) Active Directory Replication Latency (an account created on another domain controller has not replicated to the current domain controller). NETLOGON 5719 : This computer was not able to set up a secure session with a domain controller in domain OURDOMAIN due to the following: There are currently no logon servers available to service the logon request. This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator. E1kexpress 27: Intel®82567LM-3 Gigabit Network Connection – Network link is disconnected. NetBT 4300 – The driver could not be created. WMI 10 - Event filter with query "SELECT * FROM __InstanceModificationEvent WITHIN 60 WHERE TargetInstance ISA "Win32_Processor" AND TargetInstance.LoadPercentage 99" could not be reactivated in namespace "//./root/CIMV2" because of error 0x80041003. Events cannot be delivered through this filter until the problem is corrected. More or less with timestamps it becomes apparent that the network maybe the issue. 1:25:57 - Group Policy is trying to discover the domain controller information 1:25:57 - The network link has been disconnected 1:25:58 - The processing of Group Policy failed because of lack of network connectivity to a domain controller. This may be a transient condition. A success message would be generated once the machine gets connected to the domain controller and Group Policy has successfully processed. If you do not see a success message for several hours, then contact your administrator. 1:25:58 - Making LDAP calls to connect and bind to active directory. DC1.ourdomain.edu 1:25:58 - Call failed after 0 milliseconds. 1:25:58 - Forcing rediscovery of domain controller details. 1:25:58 - Group policy failed to discover the domain controller in 1030 milliseconds 1:25:58 - Periodic policy processing failed for computer OURDOMAIN\%name%$ in 1 seconds. 1:25:59 - A network link has been established at 1Gbps at full duplex 1:26:00 - The network link has been disconnected 1:26:02 - NtpClient was unable to set a domain peer to use as a time source because of discovery error. NtpClient will try again in 3473457 minutes and DOUBLE THE REATTEMPT INTERVAL thereafter. 1:26:05 - A network link has been established at 1Gbps at full duplex 1:26:08 - Name resolution for the name %Name% timed out after none of the configured DNS servers responded. 1:26:10 – The TCP/IP NetBIOS Helper service entered the running state. 1:26:11 - The time provider NtpClient is currently receiving valid time data at dc4.ourdomain.edu 1:26:14 – User Logon Notification for Customer Experience Improvement Program 1:26:15 - Group Policy received the notification Logon from Winlogon for session 1. 1:26:15 - Making LDAP calls to connect and bind to Active Directory. dc4.ourdomain.edu 1:26:18 - The LDAP call to connect and bind to Active Directory completed. dc4. ourdomain.edu. The call completed in 2309 milliseconds. 1:26:18 - Group Policy successfully discovered the Domain Controller in 2918 milliseconds. 1:26:18 - Computer details: Computer role : 2 Network name : (Blank) 1:26:18 - The LDAP call to connect and bind to Active Directory completed. dc4.ourdomain.edu. The call completed in 2309 milliseconds. 1:26:18 - Group Policy successfully discovered the Domain Controller in 2918 milliseconds. 1:26:19 - The WinHTTP Web Proxy Auto-Discovery Service service entered the running state. 1:26:46 - The Network Connections service entered the running state. 1:27:10 – Retrieved account information 1:27:10 – The system call to get account information completed. 1:27:10 - Starting policy processing due to network state change for computer OURDOMAIN\%name%$ 1:27:10 – Network state change detected 1:27:10 - Making system call to get account information. 1:27:11 - Making LDAP calls to connect and bind to Active Directory. dc4.ourdomain.edu 1:27:13 - Computer details: Computer role : 2 Network name : ourdomain.edu (Now not blank) 1:27:13 - Group Policy successfully discovered the Domain Controller in 2886 milliseconds. 1:27:13 - The LDAP call to connect and bind to Active Directory completed. dc4.ourdomain.edu The call completed in 2371 milliseconds. 1:27:15 - Estimated network bandwidth on one of the connections: 0 kbps. 1:27:15 - Estimated network bandwidth on one of the connections: 8545 kbps. 1:27:15 - A fast link was detected. The Estimated bandwidth is 8545 kbps. The slow link threshold is 500 kbps. 1:27:17 – Powershell - Engine state is changed from Available to Stopped. 1:27:20 - Completed Group Policy Local Users and Groups Extension Processing in 4539 milliseconds. 1:27:25 - Completed Group Policy Scheduled Tasks Extension Processing in 5210 milliseconds. 1:27:27 - Completed Group Policy Registry Extension Processing in 1529 milliseconds. 1:27:27 - Completed policy processing due to network state change for computer OURDOMAIN\%name%$ in 16 seconds. 1:27:27 – The Group Policy settings for the computer were processed successfully. There were no changes detected since the last successful processing of Group Policy. Any help would be appreciated. Please ask for any relevant information and it will be provided as soon as possible.

Read the article

Network Logon Issues with Group Policy and Network

- by bobloki

I am gravely in need of your help and assistance. We have a problem with our logon and startup to our Windows 7 Enterprise system. We have more than 3000 Windows Desktops situated in roughly 20+ buildings around campus. Almost every computer on campus has the problem that I will be describing. I have spent over one month peering over etl files from Windows Performance Analyzer (A great product) and hundreds of thousands of event logs. I come to you today humbled that I could not figure this out. The problem as simply put our logon times are extremely long. An average first time logon is roughly 2-10 minutes depending on the software installed. All computers are Windows 7, the oldest computers being 5 years old. Startup times on various computers range from good (1-2 minutes) to very bad (5-60). Our second time logons range from 30 seconds to 4 minutes. We have a gigabit connection between each computer on the network. We have 5 domain controllers which also double as our DNS servers. Initial testing led us to believe that this was a software problem. So I spent a few days testing machines only to find inconsistent results from the etl files from xperfview. Each subset of computers on campus had a different subset of software issues, none seeming to interfere with logon just startup. So I started looking at our group policy and located some very interesting event ID’s. Group Policy 1129: The processing of Group Policy failed because of lack of network connectivity to a domain controller. Group Policy 1055: The processing of Group Policy failed. Windows could not resolve the computer name. This could be caused by one of more of the following: a) Name Resolution failure on the current domain controller. b) Active Directory Replication Latency (an account created on another domain controller has not replicated to the current domain controller). NETLOGON 5719 : This computer was not able to set up a secure session with a domain controller in domain OURDOMAIN due to the following: There are currently no logon servers available to service the logon request. This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator. E1kexpress 27: Intel®82567LM-3 Gigabit Network Connection – Network link is disconnected. NetBT 4300 – The driver could not be created. WMI 10 - Event filter with query "SELECT * FROM __InstanceModificationEvent WITHIN 60 WHERE TargetInstance ISA "Win32_Processor" AND TargetInstance.LoadPercentage 99" could not be reactivated in namespace "//./root/CIMV2" because of error 0x80041003. Events cannot be delivered through this filter until the problem is corrected. More or less with timestamps it becomes apparent that the network maybe the issue. 1:25:57 - Group Policy is trying to discover the domain controller information 1:25:57 - The network link has been disconnected 1:25:58 - The processing of Group Policy failed because of lack of network connectivity to a domain controller. This may be a transient condition. A success message would be generated once the machine gets connected to the domain controller and Group Policy has successfully processed. If you do not see a success message for several hours, then contact your administrator. 1:25:58 - Making LDAP calls to connect and bind to active directory. DC1.ourdomain.edu 1:25:58 - Call failed after 0 milliseconds. 1:25:58 - Forcing rediscovery of domain controller details. 1:25:58 - Group policy failed to discover the domain controller in 1030 milliseconds 1:25:58 - Periodic policy processing failed for computer OURDOMAIN\%name%$ in 1 seconds. 1:25:59 - A network link has been established at 1Gbps at full duplex 1:26:00 - The network link has been disconnected 1:26:02 - NtpClient was unable to set a domain peer to use as a time source because of discovery error. NtpClient will try again in 3473457 minutes and DOUBLE THE REATTEMPT INTERVAL thereafter. 1:26:05 - A network link has been established at 1Gbps at full duplex 1:26:08 - Name resolution for the name %Name% timed out after none of the configured DNS servers responded. 1:26:10 – The TCP/IP NetBIOS Helper service entered the running state. 1:26:11 - The time provider NtpClient is currently receiving valid time data at dc4.ourdomain.edu 1:26:14 – User Logon Notification for Customer Experience Improvement Program 1:26:15 - Group Policy received the notification Logon from Winlogon for session 1. 1:26:15 - Making LDAP calls to connect and bind to Active Directory. dc4.ourdomain.edu 1:26:18 - The LDAP call to connect and bind to Active Directory completed. dc4. ourdomain.edu. The call completed in 2309 milliseconds. 1:26:18 - Group Policy successfully discovered the Domain Controller in 2918 milliseconds. 1:26:18 - Computer details: Computer role : 2 Network name : (Blank) 1:26:18 - The LDAP call to connect and bind to Active Directory completed. dc4.ourdomain.edu. The call completed in 2309 milliseconds. 1:26:18 - Group Policy successfully discovered the Domain Controller in 2918 milliseconds. 1:26:19 - The WinHTTP Web Proxy Auto-Discovery Service service entered the running state. 1:26:46 - The Network Connections service entered the running state. 1:27:10 – Retrieved account information 1:27:10 – The system call to get account information completed. 1:27:10 - Starting policy processing due to network state change for computer OURDOMAIN\%name%$ 1:27:10 – Network state change detected 1:27:10 - Making system call to get account information. 1:27:11 - Making LDAP calls to connect and bind to Active Directory. dc4.ourdomain.edu 1:27:13 - Computer details: Computer role : 2 Network name : ourdomain.edu (Now not blank) 1:27:13 - Group Policy successfully discovered the Domain Controller in 2886 milliseconds. 1:27:13 - The LDAP call to connect and bind to Active Directory completed. dc4.ourdomain.edu The call completed in 2371 milliseconds. 1:27:15 - Estimated network bandwidth on one of the connections: 0 kbps. 1:27:15 - Estimated network bandwidth on one of the connections: 8545 kbps. 1:27:15 - A fast link was detected. The Estimated bandwidth is 8545 kbps. The slow link threshold is 500 kbps. 1:27:17 – Powershell - Engine state is changed from Available to Stopped. 1:27:20 - Completed Group Policy Local Users and Groups Extension Processing in 4539 milliseconds. 1:27:25 - Completed Group Policy Scheduled Tasks Extension Processing in 5210 milliseconds. 1:27:27 - Completed Group Policy Registry Extension Processing in 1529 milliseconds. 1:27:27 - Completed policy processing due to network state change for computer OURDOMAIN\%name%$ in 16 seconds. 1:27:27 – The Group Policy settings for the computer were processed successfully. There were no changes detected since the last successful processing of Group Policy. Any help would be appreciated. Please ask for any relevant information and it will be provided as soon as possible.

Read the article

How to reduce iOS AVPlayer start delay

- by Bernt Habermeier

Note, for the below question: All assets are local on the device -- no network streaming is taking place. The videos contain audio tracks. I'm working on an iOS application that requires playing video files with minimum delay to start the video clip in question. Unfortunately we do not know what specific video clip is next until we actually need to start it up. Specifically: When one video clip is playing, we will know what the next set of (roughly) 10 video clips are, but we don't know which one exactly, until it comes time to 'immediately' play the next clip. What I've done to look at actual start delays is to call addBoundaryTimeObserverForTimes on the video player, with a time period of one millisecond to see when the video actually started to play, and I take the difference of that time stamp with the first place in the code that indicates which asset to start playing. From what I've seen thus-far, I have found that using the combination of AVAsset loading, and then creating an AVPlayerItem from that once it's ready, and then waiting for AVPlayerStatusReadyToPlay before I call play, tends to take between 1 and 3 seconds to start the clip. I've since switched to what I think is roughly equivalent: calling [AVPlayerItem playerItemWithURL:] and waiting for AVPlayerItemStatusReadyToPlay to play. Roughly same performance. One thing I'm observing is that the first AVPlayer item load is slower than the rest. Seems one idea is to pre-flight the AVPlayer with a short / empty asset before trying to play the first video might be of good general practice. [http://stackoverflow.com/questions/900461/slow-start-for-avaudioplayer-the-first-time-a-sound-is-played] I'd love to get the video start times down as much as possible, and have some ideas of things to experiment with, but would like some guidance from anyone that might be able to help. Update: idea 7, below, as-implemented yields switching times of around 500 ms. This is an improvement, but it it'd be nice to get this even faster. Idea 1: Use N AVPlayers (won't work) Using ~ 10 AVPPlayer objects and start-and-pause all ~ 10 clips, and once we know which one we really need, switch to, and un-pause the correct AVPlayer, and start all over again for the next cycle. I don't think this works, because I've read there is roughly a limit of 4 active AVPlayer's in iOS. There was someone asking about this on StackOverflow here, and found out about the 4 AVPlayer limit: fast-switching-between-videos-using-avfoundation Idea 2: Use AVQueuePlayer (won't work) I don't believe that shoving 10 AVPlayerItems into an AVQueuePlayer would pre-load them all for seamless start. AVQueuePlayer is a queue, and I think it really only makes the next video in the queue ready for immediate playback. I don't know which one out of ~10 videos we do want to play back, until it's time to start that one. ios-avplayer-video-preloading Idea 3: Load, Play, and retain AVPlayerItems in background (not 100% sure yet -- but not looking good) I'm looking at if there is any benefit to load and play the first second of each video clip in the background (suppress video and audio output), and keep a reference to each AVPlayerItem, and when we know which item needs to be played for real, swap that one in, and swap the background AVPlayer with the active one. Rinse and Repeat. The theory would be that recently played AVPlayer/AVPlayerItem's may still hold some prepared resources which would make subsequent playback faster. So far, I have not seen benefits from this, but I might not have the AVPlayerLayer setup correctly for the background. I doubt this will really improve things from what I've seen. Idea 4: Use a different file format -- maybe one that is faster to load? I'm currently using .m4v's (video-MPEG4) H.264 format. I have not played around with other formats, but it may well be that some formats are faster to decode / get ready than others. Possible still using video-MPEG4 but with a different codec, or maybe quicktime? Maybe a lossless video format where decoding / setup is faster? Idea 5: Combination of lossless video format + AVQueuePlayer If there is a video format that is fast to load, but maybe where the file size is insane, one idea might be to pre-prepare the first 10 seconds of each video clip with a version that is boated but faster to load, but back that up with an asset that is encoded in H.264. Use an AVQueuePlayer, and add the first 10 seconds in the uncompressed file format, and follow that up with one that is in H.264 which gets up to 10 seconds of prepare/preload time. So I'd get 'the best' of both worlds: fast start times, but also benefits from a more compact format. Idea 6: Use a non-standard AVPlayer / write my own / use someone else's Given my needs, maybe I can't use AVPlayer, but have to resort to AVAssetReader, and decode the first few seconds (possibly write raw file to disk), and when it comes to playback, make use of the raw format to play it back fast. Seems like a huge project to me, and if I go about it in a naive way, it's unclear / unlikely to even work better. Each decoded and uncompressed video frame is 2.25 MB. Naively speaking -- if we go with ~ 30 fps for the video, I'd end up with ~60 MB/s read-from-disk requirement, which is probably impossible / pushing it. Obviously we'd have to do some level of image compression (perhaps native openGL/es compression formats via PVRTC)... but that's kind crazy. Maybe there is a library out there that I can use? Idea 7: Combine everything into a single movie asset, and seekToTime One idea that might be easier than some of the above, is to combine everything into a single movie, and use seekToTime. The thing is that we'd be jumping all around the place. Essentially random access into the movie. I think this may actually work out okay: avplayer-movie-playing-lag-in-ios5 Which approach do you think would be best? So far, I've not made that much progress in terms of reducing the lag.

Read the article

How to access members of an rdf list with rdflib (or plain sparql)

- by tjb

What is the best way to access the members of an rdf list? I'm using rdflib (python) but an answer given in plain SPARQL is also ok (this type of answer can be used through rdfextras, a rdflib helper library). I'm trying to access the authors of a particular journal article in rdf produced by Zotero (some fields have been removed for brevity): <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:z="http://www.zotero.org/namespaces/export#" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:bib="http://purl.org/net/biblio#" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/" xmlns:link="http://purl.org/rss/1.0/modules/link/"> <bib:Article rdf:about="http://www.ncbi.nlm.nih.gov/pubmed/18273724"> <z:itemType>journalArticle</z:itemType> <dcterms:isPartOf rdf:resource="urn:issn:0954-6634"/> <bib:authors> <rdf:Seq> <rdf:li> <foaf:Person> <foaf:surname>Lee</foaf:surname> <foaf:givenname>Hyoun Seung</foaf:givenname> </foaf:Person> </rdf:li> <rdf:li> <foaf:Person> <foaf:surname>Lee</foaf:surname> <foaf:givenname>Jong Hee</foaf:givenname> </foaf:Person> </rdf:li> <rdf:li> <foaf:Person> <foaf:surname>Ahn</foaf:surname> <foaf:givenname>Gun Young</foaf:givenname> </foaf:Person> </rdf:li> <rdf:li> <foaf:Person> <foaf:surname>Lee</foaf:surname> <foaf:givenname>Dong Hun</foaf:givenname> </foaf:Person> </rdf:li> <rdf:li> <foaf:Person> <foaf:surname>Shin</foaf:surname> <foaf:givenname>Jung Won</foaf:givenname> </foaf:Person> </rdf:li> <rdf:li> <foaf:Person> <foaf:surname>Kim</foaf:surname> <foaf:givenname>Dong Hyun</foaf:givenname> </foaf:Person> </rdf:li> <rdf:li> <foaf:Person> <foaf:surname>Chung</foaf:surname> <foaf:givenname>Jin Ho</foaf:givenname> </foaf:Person> </rdf:li> </rdf:Seq> </bib:authors> <dc:title>Fractional photothermolysis for the treatment of acne scars: a report of 27 Korean patients</dc:title> <dcterms:abstract>OBJECTIVES: Atrophic post-acne scarring remains a therapeutically challe *CUT*, erythema and edema. CONCLUSIONS: The 1550-nm erbium-doped FP is associated with significant patient-reported improvement in the appearance of acne scars, with minimal downtime.</dcterms:abstract> <bib:pages>45-49</bib:pages> <dc:date>2008</dc:date> <z:shortTitle>Fractional photothermolysis for the treatment of acne scars</z:shortTitle> <dc:identifier> <dcterms:URI> <rdf:value>http://www.ncbi.nlm.nih.gov/pubmed/18273724</rdf:value> </dcterms:URI> </dc:identifier> <dcterms:dateSubmitted>2010-12-06 11:36:52</dcterms:dateSubmitted> <z:libraryCatalog>NCBI PubMed</z:libraryCatalog> <dc:description>PMID: 18273724</dc:description> </bib:Article> <bib:Journal rdf:about="urn:issn:0954-6634"> <dc:title>The Journal of Dermatological Treatment</dc:title> <prism:volume>19</prism:volume> <prism:number>1</prism:number> <dcterms:alternative>J Dermatolog Treat</dcterms:alternative> <dc:identifier>DOI 10.1080/09546630701691244</dc:identifier> <dc:identifier>ISSN 0954-6634</dc:identifier> </bib:Journal>

Read the article

Random Page Cost and Planning

- by Dave Jarvis

A query (see below) that extracts climate data from weather stations within a given radius of a city using the dates for which those weather stations actually have data. The query uses the table's only index, rather effectively: CREATE UNIQUE INDEX measurement_001_stc_idx ON climate.measurement_001 USING btree (station_id, taken, category_id); Reducing the server's configuration value for random_page_cost from 2.0 to 1.1 had a massive performance improvement for the given range (nearly an order of magnitude) because it suggested to PostgreSQL that it should use the index. While the results now return in 5 seconds (down from ~85 seconds), problematic lines remain. Bumping the query's end date by a single year causes a full table scan: sc.taken_start >= '1900-01-01'::date AND sc.taken_end <= '1997-12-31'::date AND How do I persuade PostgreSQL to use the indexes regardless of years between the two dates? (A full table scan against 43 million rows is probably not the best plan.) Find the EXPLAIN ANALYSE results below the query. Thank you! Query SELECT extract(YEAR FROM m.taken) AS year, avg(m.amount) AS amount FROM climate.city c, climate.station s, climate.station_category sc, climate.measurement m WHERE c.id = 5182 AND earth_distance( ll_to_earth(c.latitude_decimal,c.longitude_decimal), ll_to_earth(s.latitude_decimal,s.longitude_decimal)) / 1000 <= 30 AND s.elevation BETWEEN 0 AND 3000 AND s.applicable = TRUE AND sc.station_id = s.id AND sc.category_id = 1 AND sc.taken_start >= '1900-01-01'::date AND sc.taken_end <= '1996-12-31'::date AND m.station_id = s.id AND m.taken BETWEEN sc.taken_start AND sc.taken_end AND m.category_id = sc.category_id GROUP BY extract(YEAR FROM m.taken) ORDER BY extract(YEAR FROM m.taken) 1900 to 1996: Index "Sort (cost=1348597.71..1348598.21 rows=200 width=12) (actual time=2268.929..2268.935 rows=92 loops=1)" " Sort Key: (date_part('year'::text, (m.taken)::timestamp without time zone))" " Sort Method: quicksort Memory: 32kB" " -> HashAggregate (cost=1348586.56..1348590.06 rows=200 width=12) (actual time=2268.829..2268.886 rows=92 loops=1)" " -> Nested Loop (cost=0.00..1344864.01 rows=744510 width=12) (actual time=0.807..2084.206 rows=134893 loops=1)" " Join Filter: ((m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (sc.station_id = m.station_id))" " -> Nested Loop (cost=0.00..12755.07 rows=1220 width=18) (actual time=0.502..521.937 rows=23 loops=1)" " Join Filter: ((sec_to_gc(cube_distance((ll_to_earth((c.latitude_decimal)::double precision, (c.longitude_decimal)::double precision))::cube, (ll_to_earth((s.latitude_decimal)::double precision, (s.longitude_decimal)::double precision))::cube)) / 1000::double precision) <= 30::double precision)" " -> Index Scan using city_pkey1 on city c (cost=0.00..2.47 rows=1 width=16) (actual time=0.014..0.015 rows=1 loops=1)" " Index Cond: (id = 5182)" " -> Nested Loop (cost=0.00..9907.73 rows=3659 width=34) (actual time=0.014..28.937 rows=3458 loops=1)" " -> Seq Scan on station_category sc (cost=0.00..970.20 rows=3659 width=14) (actual time=0.008..10.947 rows=3458 loops=1)" " Filter: ((taken_start >= '1900-01-01'::date) AND (taken_end <= '1996-12-31'::date) AND (category_id = 1))" " -> Index Scan using station_pkey1 on station s (cost=0.00..2.43 rows=1 width=20) (actual time=0.004..0.004 rows=1 loops=3458)" " Index Cond: (s.id = sc.station_id)" " Filter: (s.applicable AND (s.elevation >= 0) AND (s.elevation <= 3000))" " -> Append (cost=0.00..1072.27 rows=947 width=18) (actual time=6.996..63.199 rows=5865 loops=23)" " -> Seq Scan on measurement m (cost=0.00..25.00 rows=6 width=22) (actual time=0.000..0.000 rows=0 loops=23)" " Filter: (m.category_id = 1)" " -> Bitmap Heap Scan on measurement_001 m (cost=20.79..1047.27 rows=941 width=18) (actual time=6.995..62.390 rows=5865 loops=23)" " Recheck Cond: ((m.station_id = sc.station_id) AND (m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (m.category_id = 1))" " -> Bitmap Index Scan on measurement_001_stc_idx (cost=0.00..20.55 rows=941 width=0) (actual time=5.775..5.775 rows=5865 loops=23)" " Index Cond: ((m.station_id = sc.station_id) AND (m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (m.category_id = 1))" "Total runtime: 2269.264 ms" 1900 to 1997: Full Table Scan "Sort (cost=1370192.26..1370192.76 rows=200 width=12) (actual time=86165.797..86165.809 rows=94 loops=1)" " Sort Key: (date_part('year'::text, (m.taken)::timestamp without time zone))" " Sort Method: quicksort Memory: 32kB" " -> HashAggregate (cost=1370181.12..1370184.62 rows=200 width=12) (actual time=86165.654..86165.736 rows=94 loops=1)" " -> Hash Join (cost=4293.60..1366355.81 rows=765061 width=12) (actual time=534.786..85920.007 rows=139721 loops=1)" " Hash Cond: (m.station_id = sc.station_id)" " Join Filter: ((m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end))" " -> Append (cost=0.00..867005.80 rows=43670150 width=18) (actual time=0.009..79202.329 rows=43670079 loops=1)" " -> Seq Scan on measurement m (cost=0.00..25.00 rows=6 width=22) (actual time=0.001..0.001 rows=0 loops=1)" " Filter: (category_id = 1)" " -> Seq Scan on measurement_001 m (cost=0.00..866980.80 rows=43670144 width=18) (actual time=0.008..73312.008 rows=43670079 loops=1)" " Filter: (category_id = 1)" " -> Hash (cost=4277.93..4277.93 rows=1253 width=18) (actual time=534.704..534.704 rows=25 loops=1)" " -> Nested Loop (cost=847.87..4277.93 rows=1253 width=18) (actual time=415.837..534.682 rows=25 loops=1)" " Join Filter: ((sec_to_gc(cube_distance((ll_to_earth((c.latitude_decimal)::double precision, (c.longitude_decimal)::double precision))::cube, (ll_to_earth((s.latitude_decimal)::double precision, (s.longitude_decimal)::double precision))::cube)) / 1000::double precision) <= 30::double precision)" " -> Index Scan using city_pkey1 on city c (cost=0.00..2.47 rows=1 width=16) (actual time=0.012..0.014 rows=1 loops=1)" " Index Cond: (id = 5182)" " -> Hash Join (cost=847.87..1352.07 rows=3760 width=34) (actual time=6.427..35.107 rows=3552 loops=1)" " Hash Cond: (s.id = sc.station_id)" " -> Seq Scan on station s (cost=0.00..367.25 rows=7948 width=20) (actual time=0.004..23.529 rows=7949 loops=1)" " Filter: (applicable AND (elevation >= 0) AND (elevation <= 3000))" " -> Hash (cost=800.87..800.87 rows=3760 width=14) (actual time=6.416..6.416 rows=3552 loops=1)" " -> Bitmap Heap Scan on station_category sc (cost=430.29..800.87 rows=3760 width=14) (actual time=2.316..5.353 rows=3552 loops=1)" " Recheck Cond: (category_id = 1)" " Filter: ((taken_start >= '1900-01-01'::date) AND (taken_end <= '1997-12-31'::date))" " -> Bitmap Index Scan on station_category_station_category_idx (cost=0.00..429.35 rows=6376 width=0) (actual time=2.268..2.268 rows=6339 loops=1)" " Index Cond: (category_id = 1)" "Total runtime: 86165.936 ms"

Read the article

Cascading S3 Sink Tap not being deleted with SinkMode.REPLACE

- by Eric Charles

We are running Cascading with a Sink Tap being configured to store in Amazon S3 and were facing some FileAlreadyExistsException (see [1]). This was only from time to time (1 time on around 100) and was not reproducable. Digging into the Cascading codem, we discovered the Hfs.deleteResource() is called (among others) by the BaseFlow.deleteSinksIfNotUpdate(). Btw, we were quite intrigued with the silent NPE (with comment "hack to get around npe thrown when fs reaches root directory"). From there, we extended the Hfs tap with our own Tap to add more action in the deleteResource() method (see [2]) with a retry mechanism calling directly the getFileSystem(conf).delete. The retry mechanism seemed to bring improvement, but we are still sometimes facing failures (see example in [3]): it sounds like HDFS returns isDeleted=true, but asking directly after if the folder exists, we receive exists=true, which should not happen. Logs also shows randomly isDeleted true or false when the flow succeeds, which sounds like the returned value is irrelevant or not to be trusted. Can anybody bring his own S3 experience with such a behavior: "folder should be deleted, but it is not"? We suspect a S3 issue, but could it also be in Cascading or HDFS? We run on Hadoop Cloudera-cdh3u5 and Cascading 2.0.1-wip-dev. [1] org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory s3n://... already exists at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132) at com.twitter.elephantbird.mapred.output.DeprecatedOutputFormatWrapper.checkOutputSpecs(DeprecatedOutputFormatWrapper.java:75) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:923) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:882) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:856) at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:104) at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:174) at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137) at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122) at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.j [2] @Override public boolean deleteResource(JobConf conf) throws IOException { LOGGER.info("Deleting resource {}", getIdentifier()); boolean isDeleted = super.deleteResource(conf); LOGGER.info("Hfs Sink Tap isDeleted is {} for {}", isDeleted, getIdentifier()); Path path = new Path(getIdentifier()); int retryCount = 0; int cumulativeSleepTime = 0; int sleepTime = 1000; while (getFileSystem(conf).exists(path)) { LOGGER .info( "Resource {} still exists, it should not... - I will continue to wait patiently...", getIdentifier()); try { LOGGER.info("Now I will sleep " + sleepTime / 1000 + " seconds while trying to delete {} - attempt: {}", getIdentifier(), retryCount + 1); Thread.sleep(sleepTime); cumulativeSleepTime += sleepTime; sleepTime *= 2; } catch (InterruptedException e) { e.printStackTrace(); LOGGER .error( "Interrupted while sleeping trying to delete {} with message {}...", getIdentifier(), e.getMessage()); throw new RuntimeException(e); } if (retryCount == 0) { getFileSystem(conf).delete(getPath(), true); } retryCount++; if (cumulativeSleepTime > MAXIMUM_TIME_TO_WAIT_TO_DELETE_MS) { break; } } if (getFileSystem(conf).exists(path)) { LOGGER .error( "We didn't succeed to delete the resource {}. Throwing now a runtime exception.", getIdentifier()); throw new RuntimeException( "Although we waited to delete the resource for " + getIdentifier() + ' ' + retryCount + " iterations, it still exists - This must be an issue in the underlying storage system."); } return isDeleted; } [3] INFO [pool-2-thread-15] (BaseFlow.java:1287) - [...] at least one sink is marked for delete INFO [pool-2-thread-15] (BaseFlow.java:1287) - [...] sink oldest modified date: Wed Dec 31 23:59:59 UTC 1969 INFO [pool-2-thread-15] (HiveSinkTap.java:148) - Now I will sleep 1 seconds while trying to delete s3n://... - attempt: 1 INFO [pool-2-thread-15] (HiveSinkTap.java:130) - Deleting resource s3n://... INFO [pool-2-thread-15] (HiveSinkTap.java:133) - Hfs Sink Tap isDeleted is true for s3n://... ERROR [pool-2-thread-15] (HiveSinkTap.java:175) - We didn't succeed to delete the resource s3n://... Throwing now a runtime exception. WARN [pool-2-thread-15] (Cascade.java:706) - [...] flow failed: ... java.lang.RuntimeException: Although we waited to delete the resource for s3n://... 0 iterations, it still exists - This must be an issue in the underlying storage system. at com.qubit.hive.tap.HiveSinkTap.deleteResource(HiveSinkTap.java:179) at com.qubit.hive.tap.HiveSinkTap.deleteResource(HiveSinkTap.java:40) at cascading.flow.BaseFlow.deleteSinksIfNotUpdate(BaseFlow.java:971) at cascading.flow.BaseFlow.prepare(BaseFlow.java:733) at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:761) at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:710) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)

Read the article

C++0x rvalue references - lvalues-rvalue binding

- by Doug

This is a follow-on question to http://stackoverflow.com/questions/2748866/c0x-rvalue-references-and-temporaries In the previous question, I asked how this code should work: void f(const std::string &); //less efficient void f(std::string &&); //more efficient void g(const char * arg) { f(arg); } It seems that the move overload should probably be called because of the implicit temporary, and this happens in GCC but not MSVC (or the EDG front-end used in MSVC's Intellisense). What about this code? void f(std::string &&); //NB: No const string & overload supplied void g1(const char * arg) { f(arg); } void g2(const std::string & arg) { f(arg); } It seems that, based on the answers to my previous question that function g1 is legal (and is accepted by GCC 4.3-4.5, but not by MSVC). However, GCC and MSVC both reject g2 because of clause 13.3.3.1.4/3, which prohibits lvalues from binding to rvalue ref arguments. I understand the rationale behind this - it is explained in N2831 "Fixing a safety problem with rvalue references". I also think that GCC is probably implementing this clause as intended by the authors of that paper, because the original patch to GCC was written by one of the authors (Doug Gregor). However, I don't this is quite intuitive. To me, (a) a const string & is conceptually closer to a string && than a const char *, and (b) the compiler could create a temporary string in g2, as if it were written like this: void g2(const std::string & arg) { f(std::string(arg)); } Indeed, sometimes the copy constructor is considered to be an implicit conversion operator. Syntactically, this is suggested by the form of a copy constructor, and the standard even mentions this specifically in clause 13.3.3.1.2/4, where the copy constructor for derived-base conversions is given a higher conversion rank than other implicit conversions: A conversion of an expression of class type to the same class type is given Exact Match rank, and a conversion of an expression of class type to a base class of that type is given Conversion rank, in spite of the fact that a copy/move constructor (i.e., a user-defined conversion function) is called for those cases. (I assume this is used when passing a derived class to a function like void h(Base), which takes a base class by value.) Motivation My motivation for asking this is something like the question asked in http://stackoverflow.com/questions/2696156/how-to-reduce-redundant-code-when-adding-new-c0x-rvalue-reference-operator-over ("How to reduce redundant code when adding new c++0x rvalue reference operator overloads"). If you have a function that accepts a number of potentially-moveable arguments, and would move them if it can (e.g. a factory function/constructor: Object create_object(string, vector<string>, string) or the like), and want to move or copy each argument as appropriate, you quickly start writing a lot of code. If the argument types are movable, then one could just write one version that accepts the arguments by value, as above. But if the arguments are (legacy) non-movable-but-swappable classes a la C++03, and you can't change them, then writing rvalue reference overloads is more efficient. So if lvalues did bind to rvalues via an implicit copy, then you could write just one overload like create_object(legacy_string &&, legacy_vector<legacy_string> &&, legacy_string &&) and it would more or less work like providing all the combinations of rvalue/lvalue reference overloads - actual arguments that were lvalues would get copied and then bound to the arguments, actual arguments that were rvalues would get directly bound. Questions My questions are then: Is this a valid interpretation of the standard? It seems that it's not the conventional or intended one, at any rate. Does it make intuitive sense? Is there a problem with this idea that I"m not seeing? It seems like you could get copies being quietly created when that's not exactly expected, but that's the status quo in places in C++03 anyway. Also, it would make some overloads viable when they're currently not, but I don't see it being a problem in practice. Is this a significant enough improvement that it would be worth making e.g. an experimental patch for GCC?

Read the article

Optimizing python code performance when importing zipped csv to a mongo collection

- by mark

I need to import a zipped csv into a mongo collection, but there is a catch - every record contains a timestamp in Pacific Time, which must be converted to the local time corresponding to the (longitude,latitude) pair found in the same record. The code looks like so: def read_csv_zip(path, timezones): with ZipFile(path) as z, z.open(z.namelist()[0]) as input: csv_rows = csv.reader(input) header = csv_rows.next() check,converters = get_aux_stuff(header) for csv_row in csv_rows: if check(csv_row): row = { converter[0]:converter[1](value) for converter, value in zip(converters, csv_row) if allow_field(converter) } ts = row['ts'] lng, lat = row['loc'] found_tz_entry = timezones.find_one(SON({'loc': {'$within': {'$box': [[lng-tz_lookup_radius, lat-tz_lookup_radius],[lng+tz_lookup_radius, lat+tz_lookup_radius]]}}})) if found_tz_entry: tz_name = found_tz_entry['tz'] local_ts = ts.astimezone(timezone(tz_name)).replace(tzinfo=None) row['tz'] = tz_name else: local_ts = (ts.astimezone(utc) + timedelta(hours = int(lng/15))).replace(tzinfo = None) row['local_ts'] = local_ts yield row def insert_documents(collection, source, batch_size): while True: items = list(itertools.islice(source, batch_size)) if len(items) == 0: break; try: collection.insert(items) except: for item in items: try: collection.insert(item) except Exception as exc: print("Failed to insert record {0} - {1}".format(item['_id'], exc)) def main(zip_path): with Connection() as connection: data = connection.mydb.data timezones = connection.timezones.data insert_documents(data, read_csv_zip(zip_path, timezones), 1000) The code proceeds as follows: Every record read from the csv is checked and converted to a dictionary, where some fields may be skipped, some titles be renamed (from those appearing in the csv header), some values may be converted (to datetime, to integers, to floats. etc ...) For each record read from the csv, a lookup is made into the timezones collection to map the record location to the respective time zone. If the mapping is successful - that timezone is used to convert the record timestamp (pacific time) to the respective local timestamp. If no mapping is found - a rough approximation is calculated. The timezones collection is appropriately indexed, of course - calling explain() confirms it. The process is slow. Naturally, having to query the timezones collection for every record kills the performance. I am looking for advises on how to improve it. Thanks. EDIT The timezones collection contains 8176040 records, each containing four values: > db.data.findOne() { "_id" : 3038814, "loc" : [ 1.48333, 42.5 ], "tz" : "Europe/Andorra" } EDIT2 OK, I have compiled a release build of http://toblerity.github.com/rtree/ and configured the rtree package. Then I have created an rtree dat/idx pair of files corresponding to my timezones collection. So, instead of calling collection.find_one I call index.intersection. Surprisingly, not only there is no improvement, but it works even more slowly now! May be rtree could be fine tuned to load the entire dat/idx pair into RAM (704M), but I do not know how to do it. Until then, it is not an alternative. In general, I think the solution should involve parallelization of the task. EDIT3 Profile output when using collection.find_one: >>> p.sort_stats('cumulative').print_stats(10) Tue Apr 10 14:28:39 2012 ImportDataIntoMongo.profile 64549590 function calls (64549180 primitive calls) in 1231.257 seconds Ordered by: cumulative time List reduced from 730 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.012 0.012 1231.257 1231.257 ImportDataIntoMongo.py:1(<module>) 1 0.001 0.001 1230.959 1230.959 ImportDataIntoMongo.py:187(main) 1 853.558 853.558 853.558 853.558 {raw_input} 1 0.598 0.598 370.510 370.510 ImportDataIntoMongo.py:165(insert_documents) 343407 9.965 0.000 359.034 0.001 ImportDataIntoMongo.py:137(read_csv_zip) 343408 2.927 0.000 287.035 0.001 c:\python27\lib\site-packages\pymongo\collection.py:489(find_one) 343408 1.842 0.000 274.803 0.001 c:\python27\lib\site-packages\pymongo\cursor.py:699(next) 343408 2.542 0.000 271.212 0.001 c:\python27\lib\site-packages\pymongo\cursor.py:644(_refresh) 343408 4.512 0.000 253.673 0.001 c:\python27\lib\site-packages\pymongo\cursor.py:605(__send_message) 343408 0.971 0.000 242.078 0.001 c:\python27\lib\site-packages\pymongo\connection.py:871(_send_message_with_response) Profile output when using index.intersection: >>> p.sort_stats('cumulative').print_stats(10) Wed Apr 11 16:21:31 2012 ImportDataIntoMongo.profile 41542960 function calls (41542536 primitive calls) in 2889.164 seconds Ordered by: cumulative time List reduced from 778 to 10 due to restriction <10> ncalls tottime percall cumtime percall filename:lineno(function) 1 0.028 0.028 2889.164 2889.164 ImportDataIntoMongo.py:1(<module>) 1 0.017 0.017 2888.679 2888.679 ImportDataIntoMongo.py:202(main) 1 2365.526 2365.526 2365.526 2365.526 {raw_input} 1 0.766 0.766 502.817 502.817 ImportDataIntoMongo.py:180(insert_documents) 343407 9.147 0.000 491.433 0.001 ImportDataIntoMongo.py:152(read_csv_zip) 343406 0.571 0.000 391.394 0.001 c:\python27\lib\site-packages\rtree-0.7.0-py2.7.egg\rtree\index.py:384(intersection) 343406 379.957 0.001 390.824 0.001 c:\python27\lib\site-packages\rtree-0.7.0-py2.7.egg\rtree\index.py:435(_intersection_obj) 686513 22.616 0.000 38.705 0.000 c:\python27\lib\site-packages\rtree-0.7.0-py2.7.egg\rtree\index.py:451(_get_objects) 343406 6.134 0.000 33.326 0.000 ImportDataIntoMongo.py:162(<dictcomp>) 346 0.396 0.001 30.665 0.089 c:\python27\lib\site-packages\pymongo\collection.py:240(insert) EDIT4 I have parallelized the code, but the results are still not very encouraging. I am convinced it could be done better. See my own answer to this question for details.

Read the article

Microsoft and jQuery

- by Rick Strahl

The jQuery JavaScript library has been steadily getting more popular and with recent developments from Microsoft, jQuery is also getting ever more exposure on the ASP.NET platform including now directly from Microsoft. jQuery is a light weight, open source DOM manipulation library for JavaScript that has changed how many developers think about JavaScript. You can download it and find more information on jQuery on www.jquery.com. For me jQuery has had a huge impact on how I develop Web applications and was probably the main reason I went from dreading to do JavaScript development to actually looking forward to implementing client side JavaScript functionality. It has also had a profound impact on my JavaScript skill level for me by seeing how the library accomplishes things (and often reviewing the terse but excellent source code). jQuery made an uncomfortable development platform (JavaScript + DOM) a joy to work on. Although jQuery is by no means the only JavaScript library out there, its ease of use, small size, huge community of plug-ins and pure usefulness has made it easily the most popular JavaScript library available today. As a long time jQuery user, I’ve been excited to see the developments from Microsoft that are bringing jQuery to more ASP.NET developers and providing more integration with jQuery for ASP.NET’s core features rather than relying on the ASP.NET AJAX library. Microsoft and jQuery – making Friends jQuery is an open source project but in the last couple of years Microsoft has really thrown its weight behind supporting this open source library as a supported component on the Microsoft platform. When I say supported I literally mean supported: Microsoft now offers actual tech support for jQuery as part of their Product Support Services (PSS) as jQuery integration has become part of several of the ASP.NET toolkits and ships in several of the default Web project templates in Visual Studio 2010. The ASP.NET MVC 3 framework (still in Beta) also uses jQuery for a variety of client side support features including client side validation and we can look forward toward more integration of client side functionality via jQuery in both MVC and WebForms in the future. In other words jQuery is becoming an optional but included component of the ASP.NET platform. PSS support means that support staff will answer jQuery related support questions as part of any support incidents related to ASP.NET which provides some piece of mind to some corporate development shops that require end to end support from Microsoft. In addition to including jQuery and supporting it, Microsoft has also been getting involved in providing development resources for extending jQuery’s functionality via plug-ins. Microsoft’s last version of the Microsoft Ajax Library – which is the successor to the native ASP.NET AJAX Library – included some really cool functionality for client templates, databinding and localization. As it turns out Microsoft has rebuilt most of that functionality using jQuery as the base API and provided jQuery plug-ins of these components. Very recently these three plug-ins were submitted and have been approved for inclusion in the official jQuery plug-in repository and been taken over by the jQuery team for further improvements and maintenance. Even more surprising: The jQuery-templates component has actually been approved for inclusion in the next major update of the jQuery core in jQuery V1.5, which means it will become a native feature that doesn’t require additional script files to be loaded. Imagine this – an open source contribution from Microsoft that has been accepted into a major open source project for a core feature improvement. Microsoft has come a long way indeed! What the Microsoft Involvement with jQuery means to you For Microsoft jQuery support is a strategic decision that affects their direction in client side development, but nothing stopped you from using jQuery in your applications prior to Microsoft’s official backing and in fact a large chunk of developers did so readily prior to Microsoft’s announcement. Official support from Microsoft brings a few benefits to developers however. jQuery support in Visual Studio 2010 means built-in support for jQuery IntelliSense, automatically added jQuery scripts in many projects types and a common base for client side functionality that actually uses what most developers are already using. If you have already been using jQuery and were worried about straying from the Microsoft line and their internal Microsoft Ajax Library – worry no more. With official support and the change in direction towards jQuery Microsoft is now following along what most in the ASP.NET community had already been doing by using jQuery, which is likely the reason for Microsoft’s shift in direction in the first place. ASP.NET AJAX and the Microsoft AJAX Library weren’t bad technology – there was tons of useful functionality buried in these libraries. However, these libraries never got off the ground, mainly because early incarnations were squarely aimed at control/component developers rather than application developers. For all the functionality that these controls provided for control developers they lacked in useful and easily usable application developer functionality that was easily accessible in day to day client side development. The result was that even though Microsoft shipped support for these tools in the box (in .NET 3.5 and 4.0), other than for the internal support in ASP.NET for things like the UpdatePanel and the ASP.NET AJAX Control Toolkit as well as some third party vendors, the Microsoft client libraries were largely ignored by the developer community opening the door for other client side solutions. Microsoft seems to be acknowledging developer choice in this case: Many more developers were going down the jQuery path rather than using the Microsoft built libraries and there seems to be little sense in continuing development of a technology that largely goes unused by the majority of developers. Kudos for Microsoft for recognizing this and gracefully changing directions. Note that even though there will be no further development in the Microsoft client libraries they will continue to be supported so if you’re using them in your applications there’s no reason to start running for the exit in a panic and start re-writing everything with jQuery. Although that might be a reasonable choice in some cases, jQuery and the Microsoft libraries work well side by side so that you can leave existing solutions untouched even as you enhance them with jQuery. The Microsoft jQuery Plug-ins – Solid Core Features One of the most interesting developments in Microsoft’s embracing of jQuery is that Microsoft has started contributing to jQuery via standard mechanism set for jQuery developers: By submitting plug-ins. Microsoft took some of the nicest new features of the unpublished Microsoft Ajax Client Library and re-wrote these components for jQuery and then submitted them as plug-ins to the jQuery plug-in repository. Accepted plug-ins get taken over by the jQuery team and that’s exactly what happened with the three plug-ins submitted by Microsoft with the templating plug-in even getting slated to be published as part of the jQuery core in the next major release (1.5). The following plug-ins are provided by Microsoft: jQuery Templates – a client side template rendering engine jQuery Data Link – a client side databinder that can synchronize changes without code jQuery Globalization – provides formatting and conversion features for dates and numbers The first two are ports of functionality that was slated for the Microsoft Ajax Library while functionality for the globalization library provides functionality that was already found in the original ASP.NET AJAX library. To me all three plug-ins address a pressing need in client side applications and provide functionality I’ve previously used in other incarnations, but with more complete implementations. Let’s take a close look at these plug-ins. jQuery Templates http://api.jquery.com/category/plugins/templates/ Client side templating is a key component for building rich JavaScript applications in the browser. Templating on the client lets you avoid from manually creating markup by creating DOM nodes and injecting them individually into the document via code. Rather you can create markup templates – similar to the way you create classic ASP server markup – and merge data into these templates to render HTML which you can then inject into the document or replace existing content with. Output from templates are rendered as a jQuery matched set and can then be easily inserted into the document as needed. Templating is key to minimize client side code and reduce repeated code for rendering logic. Instead a single template can be used in many places for updating and adding content to existing pages. Further if you build pure AJAX interfaces that rely entirely on client rendering of the initial page content, templates allow you to a use a single markup template to handle all rendering of each specific HTML section/element. I’ve used a number of different client rendering template engines with jQuery in the past including jTemplates (a PHP style templating engine) and a modified version of John Resig’s MicroTemplating engine which I built into my own set of libraries because it’s such a commonly used feature in my client side applications. jQuery templates adds a much richer templating model that allows for sub-templates and access to the data items. Like John Resig’s original Micro Template engine, the core basics of the templating engine create JavaScript code which means that templates can include JavaScript code. To give you a basic idea of how templates work imagine I have an application that downloads a set of stock quotes based on a symbol list then displays them in the document. To do this you can create an ‘item’ template that describes how each of the quotes is renderd as a template inside of the document: <script id="stockTemplate" type="text/x-jquery-tmpl"> <div id="divStockQuote" class="errordisplay" style="width: 500px;"> <div class="label">Company:</div><div><b>${Company}(${Symbol})</b></div> <div class="label">Last Price:</div><div>${LastPrice}</div> <div class="label">Net Change:</div><div> {{if NetChange > 0}} <b style="color:green" >${NetChange}</b> {{else}} <b style="color:red" >${NetChange}</b> {{/if}} </div> <div class="label">Last Update:</div><div>${LastQuoteTimeString}</div> </div> </script> The ‘template’ is little more than HTML with some markup expressions inside of it that define the template language. Notice the embedded ${} expressions which reference data from the quote objects returned from an AJAX call on the server. You can embed any JavaScript or value expression in these template expressions. There are also a number of structural commands like {{if}} and {{each}} that provide for rudimentary logic inside of your templates as well as commands ({{tmpl}} and {{wrap}}) for nesting templates. You can find more about the full set of markup expressions available in the documentation. To load up this data you can use code like the following: <script type="text/javascript"> //var Proxy = new ServiceProxy("../PageMethods/PageMethodsService.asmx/"); $(document).ready(function () { $("#btnGetQuotes").click(GetQuotes); }); function GetQuotes() { var symbols = $("#txtSymbols").val().split(","); $.ajax({ url: "../PageMethods/PageMethodsService.asmx/GetStockQuotes", data: JSON.stringify({ symbols: symbols }), // parameter map type: "POST", // data has to be POSTed contentType: "application/json", timeout: 10000, dataType: "json", success: function (result) { var quotes = result.d; var jEl = $("#stockTemplate").tmpl(quotes); $("#quoteDisplay").empty().append(jEl); }, error: function (xhr, status) { alert(status + "\r\n" + xhr.responseText); } }); }; </script> In this case an ASMX AJAX service is called to retrieve the stock quotes. The service returns an array of quote objects. The result is returned as an object with the .d property (in Microsoft service style) that returns the actual array of quotes. The template is applied with: var jEl = $("#stockTemplate").tmpl(quotes); which selects the template script tag and uses the .tmpl() function to apply the data to it. The result is a jQuery matched set of elements that can then be appended to the quote display element in the page. The template is merged against an array in this example. When the result is an array the template is automatically applied to each each array item. If you pass a single data item – like say a stock quote – the template works exactly the same way but is applied only once. Templates also have access to a $data item which provides the current data item and information about the tempalte that is currently executing. This makes it possible to keep context within the context of the template itself and also to pass context from a parent template to a child template which is very powerful. Templates can be evaluated by using the template selector and calling the .tmpl() function on the jQuery matched set as shown above or you can use the static $.tmpl() function to provide a template as a string. This allows you to dynamically create templates in code or – more likely – to load templates from the server via AJAX calls. In short there are options The above shows off some of the basics, but there’s much for functionality available in the template engine. Check the documentation link for more information and links to additional examples. The plug-in download also comes with a number of examples that demonstrate functionality. jQuery templates will become a native component in jQuery Core 1.5, so it’s definitely worthwhile checking out the engine today and get familiar with this interface. As much as I’m stoked about templating becoming part of the jQuery core because it’s such an integral part of many applications, there are also a couple shortcomings in the current incarnation: Lack of Error Handling Currently if you embed an expression that is invalid it’s simply not rendered. There’s no error rendered into the template nor do the various template functions throw errors which leaves finding of bugs as a runtime exercise. I would like some mechanism – optional if possible – to be able to get error info of what is failing in a template when it’s rendered. No String Output Templates are always rendered into a jQuery matched set and there’s no way that I can see to directly render to a string. String output can be useful for debugging as well as opening up templating for creating non-HTML string output. Limited JavaScript Access Unlike John Resig’s original MicroTemplating Engine which was entirely based on JavaScript code generation these templates are limited to a few structured commands that can ‘execute’. There’s no code execution inside of script code which means you’re limited to calling expressions available in global objects or the data item passed in. This may or may not be a big deal depending on the complexity of your template logic. Error handling has been discussed quite a bit and it’s likely there will be some solution to that particualar issue by the time jQuery templates ship. The others are relatively minor issues but something to think about anyway. jQuery Data Link http://api.jquery.com/category/plugins/data-link/ jQuery Data Link provides the ability to do two-way data binding between input controls and an underlying object’s properties. The typical scenario is linking a textbox to a property of an object and have the object updated when the text in the textbox is changed and have the textbox change when the value in the object or the entire object changes. The plug-in also supports converter functions that can be applied to provide the conversion logic from string to some other value typically necessary for mapping things like textbox string input to say a number property and potentially applying additional formatting and calculations. In theory this sounds great, however in reality this plug-in has some serious usability issues. Using the plug-in you can do things like the following to bind data: person = { firstName: "rick", lastName: "strahl"}; $(document).ready( function() { // provide for two-way linking of inputs $("form").link(person); // bind to non-input elements explicitly $("#objFirst").link(person, { firstName: { name: "objFirst", convertBack: function (value, source, target) { $(target).text(value); } } }); $("#objLast").link(person, { lastName: { name: "objLast", convertBack: function (value, source, target) { $(target).text(value); } } }); }); This code hooks up two-way linking between a couple of textboxes on the page and the person object. The first line in the .ready() handler provides mapping of object to form field with the same field names as properties on the object. Note that .link() does NOT bind items into the textboxes when you call .link() – changes are mapped only when values change and you move out of the field. Strike one. The two following commands allow manual binding of values to specific DOM elements which is effectively a one-way bind. You specify the object and a then an explicit mapping where name is an ID in the document. The converter is required to explicitly assign the value to the element. Strike two. You can also detect changes to the underlying object and cause updates to the input elements bound. Unfortunately the syntax to do this is not very natural as you have to rely on the jQuery data object. To update an object’s properties and get change notification looks like this: function updateFirstName() { $(person).data("firstName", person.firstName + " (code updated)"); } This works fine in causing any linked fields to be updated. In the bindings above both the firstName input field and objFirst DOM element gets updated. But the syntax requires you to use a jQuery .data() call for each property change to ensure that the changes are tracked properly. Really? Sure you’re binding through multiple layers of abstraction now but how is that better than just manually assigning values? The code savings (if any) are going to be minimal. As much as I would like to have a WPF/Silverlight/Observable-like binding mechanism in client script, this plug-in doesn’t help much towards that goal in its current incarnation. While you can bind values, the ‘binder’ is too limited to be really useful. If initial values can’t be assigned from the mappings you’re going to end up duplicating work loading the data using some other mechanism. There’s no easy way to re-bind data with a different object altogether since updates trigger only through the .data members. Finally, any non-input elements have to be bound via code that’s fairly verbose and frankly may be more voluminous than what you might write by hand for manual binding and unbinding. Two way binding can be very useful but it has to be easy and most importantly natural. If it’s more work to hook up a binding than writing a couple of lines to do binding/unbinding this sort of thing helps very little in most scenarios. In talking to some of the developers the feature set for Data Link is not complete and they are still soliciting input for features and functionality. If you have ideas on how you want this feature to be more useful get involved and post your recommendations. As it stands, it looks to me like this component needs a lot of love to become useful. For this component to really provide value, bindings need to be able to be refreshed easily and work at the object level, not just the property level. It seems to me we would be much better served by a model binder object that can perform these binding/unbinding tasks in bulk rather than a tool where each link has to be mapped first. I also find the choice of creating a jQuery plug-in questionable – it seems a standalone object – albeit one that relies on the jQuery library – would provide a more intuitive interface than the current forcing of options onto a plug-in style interface. Out of the three Microsoft created components this is by far the least useful and least polished implementation at this point. jQuery Globalization http://github.com/jquery/jquery-global Globalization in JavaScript applications often gets short shrift and part of the reason for this is that natively in JavaScript there’s little support for formatting and parsing of numbers and dates. There are a number of JavaScript libraries out there that provide some support for globalization, but most are limited to a particular portion of globalization. As .NET developers we’re fairly spoiled by the richness of APIs provided in the framework and when dealing with client development one really notices the lack of these features. While you may not necessarily need to localize your application the globalization plug-in also helps with some basic tasks for non-localized applications: Dealing with formatting and parsing of dates and time values. Dates in particular are problematic in JavaScript as there are no formatters whatsoever except the .toString() method which outputs a verbose and next to useless long string. With the globalization plug-in you get a good chunk of the formatting and parsing functionality that the .NET framework provides on the server. You can write code like the following for example to format numbers and dates: var date = new Date(); var output = $.format(date, "MMM. dd, yy") + "\r\n" + $.format(date, "d") + "\r\n" + // 10/25/2010 $.format(1222.32213, "N2") + "\r\n" + $.format(1222.33, "c") + "\r\n"; alert(output); This becomes even more useful if you combine it with templates which can also include any JavaScript expressions. Assuming the globalization plug-in is loaded you can create template expressions that use the $.format function. Here’s the template I used earlier for the stock quote again with a couple of formats applied: <script id="stockTemplate" type="text/x-jquery-tmpl"> <div id="divStockQuote" class="errordisplay" style="width: 500px;"> <div class="label">Company:</div><div><b>${Company}(${Symbol})</b></div> <div class="label">Last Price:</div> <div>${$.format(LastPrice,"N2")}</div> <div class="label">Net Change:</div><div> {{if NetChange > 0}} <b style="color:green" >${NetChange}</b> {{else}} <b style="color:red" >${NetChange}</b> {{/if}} </div> <div class="label">Last Update:</div> <div>${$.format(LastQuoteTime,"MMM dd, yyyy")}</div> </div> </script> There are also parsing methods that can parse dates and numbers from strings into numbers easily: alert($.parseDate("25.10.2010")); alert($.parseInt("12.222")); // de-DE uses . for thousands separators As you can see culture specific options are taken into account when parsing. The globalization plugin provides rich support for a variety of locales: Get a list of all available cultures Query cultures for culture items (like currency symbol, separators etc.) Localized string names for all calendar related items (days of week, months) Generated off of .NET’s supported locales In short you get much of the same functionality that you already might be using in .NET on the server side. The plugin includes a huge number of locales and an Globalization.all.min.js file that contains the text defaults for each of these locales as well as small locale specific script files that define each of the locale specific settings. It’s highly recommended that you NOT use the huge globalization file that includes all locales, but rather add script references to only those languages you explicitly care about. Overall this plug-in is a welcome helper. Even if you use it with a single locale (like en-US) and do no other localization, you’ll gain solid support for number and date formatting which is a vital feature of many applications. Changes for Microsoft It’s good to see Microsoft coming out of its shell and away from the ‘not-built-here’ mentality that has been so pervasive in the past. It’s especially good to see it applied to jQuery – a technology that has stood in drastic contrast to Microsoft’s own internal efforts in terms of design, usage model and… popularity. It’s great to see that Microsoft is paying attention to what customers prefer to use and supporting the customer sentiment – even if it meant drastically changing course of policy and moving into a more open and sharing environment in the process. The additional jQuery support that has been introduced in the last two years certainly has made lives easier for many developers on the ASP.NET platform. It’s also nice to see Microsoft submitting proposals through the standard jQuery process of plug-ins and getting accepted for various very useful projects. Certainly the jQuery Templates plug-in is going to be very useful to many especially since it will be baked into the jQuery core in jQuery 1.5. I hope we see more of this type of involvement from Microsoft in the future. Kudos!© Rick Strahl, West Wind Technologies, 2005-2010Posted in jQuery ASP.NET

Read the article

Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article

Toorcon 15 (2013)

- by danx

The Toorcon gang (senior staff): h1kari (founder), nfiltr8, and Geo Introduction to Toorcon 15 (2013) A Tale of One Software Bypass of MS Windows 8 Secure Boot Breaching SSL, One Byte at a Time Running at 99%: Surviving an Application DoS Security Response in the Age of Mass Customized Attacks x86 Rewriting: Defeating RoP and other Shinanighans Clowntown Express: interesting bugs and running a bug bounty program Active Fingerprinting of Encrypted VPNs Making Attacks Go Backwards Mask Your Checksums—The Gorry Details Adventures with weird machines thirty years after "Reflections on Trusting Trust" Introduction to Toorcon 15 (2013) Toorcon 15 is the 15th annual security conference held in San Diego. I've attended about a third of them and blogged about previous conferences I attended here starting in 2003. As always, I've only summarized the talks I attended and interested me enough to write about them. Be aware that I may have misrepresented the speaker's remarks and that they are not my remarks or opinion, or those of my employer, so don't quote me or them. Those seeking further details may contact the speakers directly or use The Google. For some talks, I have a URL for further information. A Tale of One Software Bypass of MS Windows 8 Secure Boot Andrew Furtak and Oleksandr Bazhaniuk Yuri Bulygin, Oleksandr ("Alex") Bazhaniuk, and (not present) Andrew Furtak Yuri and Alex talked about UEFI and Bootkits and bypassing MS Windows 8 Secure Boot, with vendor recommendations. They previously gave this talk at the BlackHat 2013 conference. MS Windows 8 Secure Boot Overview UEFI (Unified Extensible Firmware Interface) is interface between hardware and OS. UEFI is processor and architecture independent. Malware can replace bootloader (bootx64.efi, bootmgfw.efi). Once replaced can modify kernel. Trivial to replace bootloader. Today many legacy bootkits—UEFI replaces them most of them. MS Windows 8 Secure Boot verifies everything you load, either through signatures or hashes. UEFI firmware relies on secure update (with signed update). You would think Secure Boot would rely on ROM (such as used for phones0, but you can't do that for PCs—PCs use writable memory with signatures DXE core verifies the UEFI boat loader(s) OS Loader (winload.efi, winresume.efi) verifies the OS kernel A chain of trust is established with a root key (Platform Key, PK), which is a cert belonging to the platform vendor. Key Exchange Keys (KEKs) verify an "authorized" database (db), and "forbidden" database (dbx). X.509 certs with SHA-1/SHA-256 hashes. Keys are stored in non-volatile (NV) flash-based NVRAM. Boot Services (BS) allow adding/deleting keys (can't be accessed once OS starts—which uses Run-Time (RT)). Root cert uses RSA-2048 public keys and PKCS#7 format signatures. SecureBoot — enable disable image signature checks SetupMode — update keys, self-signed keys, and secure boot variables CustomMode — allows updating keys Secure Boot policy settings are: always execute, never execute, allow execute on security violation, defer execute on security violation, deny execute on security violation, query user on security violation Attacking MS Windows 8 Secure Boot Secure Boot does NOT protect from physical access. Can disable from console. Each BIOS vendor implements Secure Boot differently. There are several platform and BIOS vendors. It becomes a "zoo" of implementations—which can be taken advantage of. Secure Boot is secure only when all vendors implement it correctly. Allow only UEFI firmware signed updates protect UEFI firmware from direct modification in flash memory protect FW update components program SPI controller securely protect secure boot policy settings in nvram protect runtime api disable compatibility support module which allows unsigned legacy Can corrupt the Platform Key (PK) EFI root certificate variable in SPI flash. If PK is not found, FW enters setup mode wich secure boot turned off. Can also exploit TPM in a similar manner. One is not supposed to be able to directly modify the PK in SPI flash from the OS though. But they found a bug that they can exploit from User Mode (undisclosed) and demoed the exploit. It loaded and ran their own bootkit. The exploit requires a reboot. Multiple vendors are vulnerable. They will disclose this exploit to vendors in the future. Recommendations: allow only signed updates protect UEFI fw in ROM protect EFI variable store in ROM Breaching SSL, One Byte at a Time Yoel Gluck and Angelo Prado Angelo Prado and Yoel Gluck, Salesforce.com CRIME is software that performs a "compression oracle attack." This is possible because the SSL protocol doesn't hide length, and because SSL compresses the header. CRIME requests with every possible character and measures the ciphertext length. Look for the plaintext which compresses the most and looks for the cookie one byte-at-a-time. SSL Compression uses LZ77 to reduce redundancy. Huffman coding replaces common byte sequences with shorter codes. US CERT thinks the SSL compression problem is fixed, but it isn't. They convinced CERT that it wasn't fixed and they issued a CVE. BREACH, breachattrack.com BREACH exploits the SSL response body (Accept-Encoding response, Content-Encoding). It takes advantage of the fact that the response is not compressed. BREACH uses gzip and needs fairly "stable" pages that are static for ~30 seconds. It needs attacker-supplied content (say from a web form or added to a URL parameter). BREACH listens to a session's requests and responses, then inserts extra requests and responses. Eventually, BREACH guesses a session's secret key. Can use compression to guess contents one byte at-a-time. For example, "Supersecret SupersecreX" (a wrong guess) compresses 10 bytes, and "Supersecret Supersecret" (a correct guess) compresses 11 bytes, so it can find each character by guessing every character. To start the guess, BREACH needs at least three known initial characters in the response sequence. Compression length then "leaks" information. Some roadblocks include no winners (all guesses wrong) or too many winners (multiple possibilities that compress the same). The solutions include: lookahead (guess 2 or 3 characters at-a-time instead of 1 character). Expensive rollback to last known conflict check compression ratio can brute-force first 3 "bootstrap" characters, if needed (expensive) block ciphers hide exact plain text length. Solution is to align response in advance to block size Mitigations length: use variable padding secrets: dynamic CSRF tokens per request secret: change over time separate secret to input-less servlets Future work eiter understand DEFLATE/GZIP HTTPS extensions Running at 99%: Surviving an Application DoS Ryan Huber Ryan Huber, Risk I/O Ryan first discussed various ways to do a denial of service (DoS) attack against web services. One usual method is to find a slow web page and do several wgets. Or download large files. Apache is not well suited at handling a large number of connections, but one can put something in front of it Can use Apache alternatives, such as nginx How to identify malicious hosts short, sudden web requests user-agent is obvious (curl, python) same url requested repeatedly no web page referer (not normal) hidden links. hide a link and see if a bot gets it restricted access if not your geo IP (unless the website is global) missing common headers in request regular timing first seen IP at beginning of attack count requests per hosts (usually a very large number) Use of captcha can mitigate attacks, but you'll lose a lot of genuine users. Bouncer, goo.gl/c2vyEc and www.github.com/rawdigits/Bouncer Bouncer is software written by Ryan in netflow. Bouncer has a small, unobtrusive footprint and detects DoS attempts. It closes blacklisted sockets immediately (not nice about it, no proper close connection). Aggregator collects requests and controls your web proxies. Need NTP on the front end web servers for clean data for use by bouncer. Bouncer is also useful for a popularity storm ("Slashdotting") and scraper storms. Future features: gzip collection data, documentation, consumer library, multitask, logging destroyed connections. Takeaways: DoS mitigation is easier with a complete picture Bouncer designed to make it easier to detect and defend DoS—not a complete cure Security Response in the Age of Mass Customized Attacks Peleus Uhley and Karthik Raman Peleus Uhley and Karthik Raman, Adobe ASSET, blogs.adobe.com/asset/ Peleus and Karthik talked about response to mass-customized exploits. Attackers behave much like a business. "Mass customization" refers to concept discussed in the book Future Perfect by Stan Davis of Harvard Business School. Mass customization is differentiating a product for an individual customer, but at a mass production price. For example, the same individual with a debit card receives basically the same customized ATM experience around the world. Or designing your own PC from commodity parts. Exploit kits are another example of mass customization. The kits support multiple browsers and plugins, allows new modules. Exploit kits are cheap and customizable. Organized gangs use exploit kits. A group at Berkeley looked at 77,000 malicious websites (Grier et al., "Manufacturing Compromise: The Emergence of Exploit-as-a-Service", 2012). They found 10,000 distinct binaries among them, but derived from only a dozen or so exploit kits. Characteristics of Mass Malware: potent, resilient, relatively low cost Technical characteristics: multiple OS, multipe payloads, multiple scenarios, multiple languages, obfuscation Response time for 0-day exploits has gone down from ~40 days 5 years ago to about ~10 days now. So the drive with malware is towards mass customized exploits, to avoid detection There's plenty of evicence that exploit development has Project Manager bureaucracy. They infer from the malware edicts to: support all versions of reader support all versions of windows support all versions of flash support all browsers write large complex, difficult to main code (8750 lines of JavaScript for example Exploits have "loose coupling" of multipe versions of software (adobe), OS, and browser. This allows specific attacks against specific versions of multiple pieces of software. Also allows exploits of more obscure software/OS/browsers and obscure versions. Gave examples of exploits that exploited 2, 3, 6, or 14 separate bugs. However, these complete exploits are more likely to be buggy or fragile in themselves and easier to defeat. Future research includes normalizing malware and Javascript. Conclusion: The coming trend is that mass-malware with mass zero-day attacks will result in mass customization of attacks. x86 Rewriting: Defeating RoP and other Shinanighans Richard Wartell Richard Wartell The attack vector we are addressing here is: First some malware causes a buffer overflow. The malware has no program access, but input access and buffer overflow code onto stack Later the stack became non-executable. The workaround malware used was to write a bogus return address to the stack jumping to malware Later came ASLR (Address Space Layout Randomization) to randomize memory layout and make addresses non-deterministic. The workaround malware used was to jump t existing code segments in the program that can be used in bad ways "RoP" is Return-oriented Programming attacks. RoP attacks use your own code and write return address on stack to (existing) expoitable code found in program ("gadgets"). Pinkie Pie was paid $60K last year for a RoP attack. One solution is using anti-RoP compilers that compile source code with NO return instructions. ASLR does not randomize address space, just "gadgets". IPR/ILR ("Instruction Location Randomization") randomizes each instruction with a virtual machine. Richard's goal was to randomize a binary with no source code access. He created "STIR" (Self-Transofrming Instruction Relocation). STIR disassembles binary and operates on "basic blocks" of code. The STIR disassembler is conservative in what to disassemble. Each basic block is moved to a random location in memory. Next, STIR writes new code sections with copies of "basic blocks" of code in randomized locations. The old code is copied and rewritten with jumps to new code. the original code sections in the file is marked non-executible. STIR has better entropy than ASLR in location of code. Makes brute force attacks much harder. STIR runs on MS Windows (PEM) and Linux (ELF). It eliminated 99.96% or more "gadgets" (i.e., moved the address). Overhead usually 5-10% on MS Windows, about 1.5-4% on Linux (but some code actually runs faster!). The unique thing about STIR is it requires no source access and the modified binary fully works! Current work is to rewrite code to enforce security policies. For example, don't create a *.{exe,msi,bat} file. Or don't connect to the network after reading from the disk. Clowntown Express: interesting bugs and running a bug bounty program Collin Greene Collin Greene, Facebook Collin talked about Facebook's bug bounty program. Background at FB: FB has good security frameworks, such as security teams, external audits, and cc'ing on diffs. But there's lots of "deep, dark, forgotten" parts of legacy FB code. Collin gave several examples of bountied bugs. Some bounty submissions were on software purchased from a third-party (but bounty claimers don't know and don't care). We use security questions, as does everyone else, but they are basically insecure (often easily discoverable). Collin didn't expect many bugs from the bounty program, but they ended getting 20+ good bugs in first 24 hours and good submissions continue to come in. Bug bounties bring people in with different perspectives, and are paid only for success. Bug bounty is a better use of a fixed amount of time and money versus just code review or static code analysis. The Bounty program started July 2011 and paid out $1.5 million to date. 14% of the submissions have been high priority problems that needed to be fixed immediately. The best bugs come from a small % of submitters (as with everything else)—the top paid submitters are paid 6 figures a year. Spammers like to backstab competitors. The youngest sumitter was 13. Some submitters have been hired. Bug bounties also allows to see bugs that were missed by tools or reviews, allowing improvement in the process. Bug bounties might not work for traditional software companies where the product has release cycle or is not on Internet. Active Fingerprinting of Encrypted VPNs Anna Shubina Anna Shubina, Dartmouth Institute for Security, Technology, and Society (I missed the start of her talk because another track went overtime. But I have the DVD of the talk, so I'll expand later) IPsec leaves fingerprints. Using netcat, one can easily visually distinguish various crypto chaining modes just from packet timing on a chart (example, DES-CBC versus AES-CBC) One can tell a lot about VPNs just from ping roundtrips (such as what router is used) Delayed packets are not informative about a network, especially if far away from the network More needed to explore about how TCP works in real life with respect to timing Making Attacks Go Backwards Fuzzynop FuzzyNop, Mandiant This talk is not about threat attribution (finding who), product solutions, politics, or sales pitches. But who are making these malware threats? It's not a single person or group—they have diverse skill levels. There's a lot of fat-fingered fumblers out there. Always look for low-hanging fruit first: "hiding" malware in the temp, recycle, or root directories creation of unnamed scheduled tasks obvious names of files and syscalls ("ClearEventLog") uncleared event logs. Clearing event log in itself, and time of clearing, is a red flag and good first clue to look for on a suspect system Reverse engineering is hard. Disassembler use takes practice and skill. A popular tool is IDA Pro, but it takes multiple interactive iterations to get a clean disassembly. Key loggers are used a lot in targeted attacks. They are typically custom code or built in a backdoor. A big tip-off is that non-printable characters need to be printed out (such as "[Ctrl]" "[RightShift]") or time stamp printf strings. Look for these in files. Presence is not proof they are used. Absence is not proof they are not used. Java exploits. Can parse jar file with idxparser.py and decomile Java file. Java typially used to target tech companies. Backdoors are the main persistence mechanism (provided externally) for malware. Also malware typically needs command and control. Application of Artificial Intelligence in Ad-Hoc Static Code Analysis John Ashaman John Ashaman, Security Innovation Initially John tried to analyze open source files with open source static analysis tools, but these showed thousands of false positives. Also tried using grep, but tis fails to find anything even mildly complex. So next John decided to write his own tool. His approach was to first generate a call graph then analyze the graph. However, the problem is that making a call graph is really hard. For example, one problem is "evil" coding techniques, such as passing function pointer. First the tool generated an Abstract Syntax Tree (AST) with the nodes created from method declarations and edges created from method use. Then the tool generated a control flow graph with the goal to find a path through the AST (a maze) from source to sink. The algorithm is to look at adjacent nodes to see if any are "scary" (a vulnerability), using heuristics for search order. The tool, called "Scat" (Static Code Analysis Tool), currently looks for C# vulnerabilities and some simple PHP. Later, he plans to add more PHP, then JSP and Java. For more information see his posts in Security Innovation blog and NRefactory on GitHub. Mask Your Checksums—The Gorry Details Eric (XlogicX) Davisson Eric (XlogicX) Davisson Sometimes in emailing or posting TCP/IP packets to analyze problems, you may want to mask the IP address. But to do this correctly, you need to mask the checksum too, or you'll leak information about the IP. Problem reports found in stackoverflow.com, sans.org, and pastebin.org are usually not masked, but a few companies do care. If only the IP is masked, the IP may be guessed from checksum (that is, it leaks data). Other parts of packet may leak more data about the IP. TCP and IP checksums both refer to the same data, so can get more bits of information out of using both checksums than just using one checksum. Also, one can usually determine the OS from the TTL field and ports in a packet header. If we get hundreds of possible results (16x each masked nibble that is unknown), one can do other things to narrow the results, such as look at packet contents for domain or geo information. With hundreds of results, can import as CSV format into a spreadsheet. Can corelate with geo data and see where each possibility is located. Eric then demoed a real email report with a masked IP packet attached. Was able to find the exact IP address, given the geo and university of the sender. Point is if you're going to mask a packet, do it right. Eric wouldn't usually bother, but do it correctly if at all, to not create a false impression of security. Adventures with weird machines thirty years after "Reflections on Trusting Trust" Sergey Bratus Sergey Bratus, Dartmouth College (and Julian Bangert and Rebecca Shapiro, not present) "Reflections on Trusting Trust" refers to Ken Thompson's classic 1984 paper. "You can't trust code that you did not totally create yourself." There's invisible links in the chain-of-trust, such as "well-installed microcode bugs" or in the compiler, and other planted bugs. Thompson showed how a compiler can introduce and propagate bugs in unmodified source. But suppose if there's no bugs and you trust the author, can you trust the code? Hell No! There's too many factors—it's Babylonian in nature. Why not? Well, Input is not well-defined/recognized (code's assumptions about "checked" input will be violated (bug/vunerabiliy). For example, HTML is recursive, but Regex checking is not recursive. Input well-formed but so complex there's no telling what it does For example, ELF file parsing is complex and has multiple ways of parsing. Input is seen differently by different pieces of program or toolchain Any Input is a program input executes on input handlers (drives state changes & transitions) only a well-defined execution model can be trusted (regex/DFA, PDA, CFG) Input handler either is a "recognizer" for the inputs as a well-defined language (see langsec.org) or it's a "virtual machine" for inputs to drive into pwn-age ELF ABI (UNIX/Linux executible file format) case study. Problems can arise from these steps (without planting bugs): compiler linker loader ld.so/rtld relocator DWARF (debugger info) exceptions The problem is you can't really automatically analyze code (it's the "halting problem" and undecidable). Only solution is to freeze code and sign it. But you can't freeze everything! Can't freeze ASLR or loading—must have tables and metadata. Any sufficiently complex input data is the same as VM byte code Example, ELF relocation entries + dynamic symbols == a Turing Complete Machine (TM). @bxsays created a Turing machine in Linux from relocation data (not code) in an ELF file. For more information, see Rebecca "bx" Shapiro's presentation from last year's Toorcon, "Programming Weird Machines with ELF Metadata" @bxsays did same thing with Mach-O bytecode Or a DWARF exception handling data .eh_frame + glibc == Turning Machine X86 MMU (IDT, GDT, TSS): used address translation to create a Turning Machine. Page handler reads and writes (on page fault) memory. Uses a page table, which can be used as Turning Machine byte code. Example on Github using this TM that will fly a glider across the screen Next Sergey talked about "Parser Differentials". That having one input format, but two parsers, will create confusion and opportunity for exploitation. For example, CSRs are parsed during creation by cert requestor and again by another parser at the CA. Another example is ELF—several parsers in OS tool chain, which are all different. Can have two different Program Headers (PHDRs) because ld.so parses multiple PHDRs. The second PHDR can completely transform the executable. This is described in paper in the first issue of International Journal of PoC. Conclusions trusting computers not only about bugs! Bugs are part of a problem, but no by far all of it complex data formats means bugs no "chain of trust" in Babylon! (that is, with parser differentials) we need to squeeze complexity out of data until data stops being "code equivalent" Further information See and langsec.org. USENIX WOOT 2013 (Workshop on Offensive Technologies) for "weird machines" papers and videos.

Read the article

ASP.NET Frameworks and Raw Throughput Performance

- by Rick Strahl

A few days ago I had a curious thought: With all these different technologies that the ASP.NET stack has to offer, what's the most efficient technology overall to return data for a server request? When I started this it was mere curiosity rather than a real practical need or result. Different tools are used for different problems and so performance differences are to be expected. But still I was curious to see how the various technologies performed relative to each just for raw throughput of the request getting to the endpoint and back out to the client with as little processing in the actual endpoint logic as possible (aka Hello World!). I want to clarify that this is merely an informal test for my own curiosity and I'm sharing the results and process here because I thought it was interesting. It's been a long while since I've done any sort of perf testing on ASP.NET, mainly because I've not had extremely heavy load requirements and because overall ASP.NET performs very well even for fairly high loads so that often it's not that critical to test load performance. This post is not meant to make a point or even come to a conclusion which tech is better, but just to act as a reference to help understand some of the differences in perf and give a starting point to play around with this yourself. I've included the code for this simple project, so you can play with it and maybe add a few additional tests for different things if you like. Source Code on GitHub I looked at this data for these technologies: ASP.NET Web API ASP.NET MVC WebForms ASP.NET WebPages ASMX AJAX Services (couldn't get AJAX/JSON to run on IIS8 ) WCF Rest Raw ASP.NET HttpHandlers It's quite a mixed bag, of course and the technologies target different types of development. What started out as mere curiosity turned into a bit of a head scratcher as the results were sometimes surprising. What I describe here is more to satisfy my curiosity more than anything and I thought it interesting enough to discuss on the blog :-) First test: Raw Throughput The first thing I did is test raw throughput for the various technologies. This is the least practical test of course since you're unlikely to ever create the equivalent of a 'Hello World' request in a real life application. The idea here is to measure how much time a 'NOP' request takes to return data to the client. So for this request I create the simplest Hello World request that I could come up for each tech. Http Handler The first is the lowest level approach which is an HTTP handler. public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public bool IsReusable { get { return true; } } } WebForms Next I added a couple of ASPX pages - one using CodeBehind and one using only a markup page. The CodeBehind page simple does this in CodeBehind without any markup in the ASPX page: public partial class HelloWorld_CodeBehind : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { Response.Write("Hello World. Time is: " + DateTime.Now.ToString() ); Response.End(); } } while the Markup page only contains some static output via an expression:<%@ Page Language="C#" AutoEventWireup="false" CodeBehind="HelloWorld_Markup.aspx.cs" Inherits="AspNetFrameworksPerformance.HelloWorld_Markup" %> Hello World. Time is <%= DateTime.Now %> ASP.NET WebPages WebPages is the freestanding Razor implementation of ASP.NET. Here's the simple HelloWorld.cshtml page:Hello World @DateTime.Now WCF REST WCF REST was the token REST implementation for ASP.NET before WebAPI and the inbetween step from ASP.NET AJAX. I'd like to forget that this technology was ever considered for production use, but I'll include it here. Here's an OperationContract class: [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World" + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } } WCF REST can return arbitrary results by returning a Stream object and a content type. The code above turns the string result into a stream and returns that back to the client. ASP.NET AJAX (ASMX Services) I also wanted to test ASP.NET AJAX services because prior to WebAPI this is probably still the most widely used AJAX technology for the ASP.NET stack today. Unfortunately I was completely unable to get this running on my Windows 8 machine. Visual Studio 2012 removed adding of ASP.NET AJAX services, and when I tried to manually add the service and configure the script handler references it simply did not work - I always got a SOAP response for GET and POST operations. No matter what I tried I always ended up getting XML results even when explicitly adding the ScriptHandler. So, I didn't test this (but the code is there - you might be able to test this on a Windows 7 box). ASP.NET MVC Next up is probably the most popular ASP.NET technology at the moment: MVC. Here's the small controller: public class MvcPerformanceController : Controller { public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } } ASP.NET WebAPI Next up is WebAPI which looks kind of similar to MVC. Except here I have to use a StringContent result to return the response: public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } } Testing Take a minute to think about each of the technologies… and take a guess which you think is most efficient in raw throughput. The fastest should be pretty obvious, but the others - maybe not so much. The testing I did is pretty informal since it was mainly to satisfy my curiosity - here's how I did this: I used Apache Bench (ab.exe) from a full Apache HTTP installation to run and log the test results of hitting the server. ab.exe is a small executable that lets you hit a URL repeatedly and provides counter information about the number of requests, requests per second etc. ab.exe and the batch file are located in the \LoadTests folder of the project. An ab.exe command line looks like this: ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld which hits the specified URL 100,000 times with a load factor of 20 concurrent requests. This results in output like this: It's a great way to get a quick and dirty performance summary. Run it a few times to make sure there's not a large amount of varience. You might also want to do an IISRESET to clear the Web Server. Just make sure you do a short test run to warm up the server first - otherwise your first run is likely to be skewed downwards. ab.exe also allows you to specify headers and provide POST data and many other things if you want to get a little more fancy. Here all tests are GET requests to keep it simple. I ran each test: 100,000 iterations Load factor of 20 concurrent connections IISReset before starting A short warm up run for API and MVC to make sure startup cost is mitigated Here is the batch file I used for the test: IISRESET REM make sure you add REM C:\Program Files (x86)\Apache Software Foundation\Apache2.2\bin REM to your path so ab.exe can be found REM Warm up ab.exe -n100 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJsonab.exe -n100 -c20 http://localhost/aspnetperf/api/HelloWorldJson ab.exe -n100 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld ab.exe -n100000 -c20 http://localhost/aspnetperf/handler.ashx > handler.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_CodeBehind.aspx > AspxCodeBehind.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/HelloWorld_Markup.aspx > AspxMarkup.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorld > Wcf.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldCode > Mvc.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorld > WebApi.txt I ran each of these tests 3 times and took the average score for Requests/second, with the machine otherwise idle. I did see a bit of variance when running many tests but the values used here are the medians. Part of this has to do with the fact I ran the tests on my local machine - result would probably more consistent running the load test on a separate machine hitting across the network. I ran these tests locally on my laptop which is a Dell XPS with quad core Sandibridge I7-2720QM @ 2.20ghz and a fast SSD drive on Windows 8. CPU load during tests ran to about 70% max across all 4 cores (IOW, it wasn't overloading the machine). Ideally you can try running these tests on a separate machine hitting the local machine. If I remember correctly IIS 7 and 8 on client OSs don't throttle so the performance here should be Results Ok, let's cut straight to the chase. Below are the results from the tests… It's not surprising that the handler was fastest. But it was a bit surprising to me that the next fastest was WebForms and especially Web Forms with markup over a CodeBehind page. WebPages also fared fairly well. MVC and WebAPI are a little slower and the slowest by far is WCF REST (which again I find surprising). As mentioned at the start the raw throughput tests are not overly practical as they don't test scripting performance for the HTML generation engines or serialization performances of the data engines. All it really does is give you an idea of the raw throughput for the technology from time of request to reaching the endpoint and returning minimal text data back to the client which indicates full round trip performance. But it's still interesting to see that Web Forms performs better in throughput than either MVC, WebAPI or WebPages. It'd be interesting to try this with a few pages that actually have some parsing logic on it, but that's beyond the scope of this throughput test. But what's also amazing about this test is the sheer amount of traffic that a laptop computer is handling. Even the slowest tech managed 5700 requests a second, which is one hell of a lot of requests if you extrapolate that out over a 24 hour period. Remember these are not static pages, but dynamic requests that are being served. Another test - JSON Data Service Results The second test I used a JSON result from several of the technologies. I didn't bother running WebForms and WebPages through this test since that doesn't make a ton of sense to return data from the them (OTOH, returning text from the APIs didn't make a ton of sense either :-) In these tests I have a small Person class that gets serialized and then returned to the client. The Person class looks like this: public class Person { public Person() { Id = 10; Name = "Rick"; Entered = DateTime.Now; } public int Id { get; set; } public string Name { get; set; } public DateTime Entered { get; set; } } Here are the updated handler classes that use Person: Handler public class Handler : IHttpHandler { public void ProcessRequest(HttpContext context) { var action = context.Request.QueryString["action"]; if (action == "json") JsonRequest(context); else TextRequest(context); } public void TextRequest(HttpContext context) { context.Response.ContentType = "text/plain"; context.Response.Write("Hello World. Time is: " + DateTime.Now.ToString()); } public void JsonRequest(HttpContext context) { var json = JsonConvert.SerializeObject(new Person(), Formatting.None); context.Response.ContentType = "application/json"; context.Response.Write(json); } public bool IsReusable { get { return true; } } } This code adds a little logic to check for a action query string and route the request to an optional JSON result method. To generate JSON, I'm using the same JSON.NET serializer (JsonConvert.SerializeObject) used in Web API to create the JSON response. WCF REST [ServiceContract(Namespace = "")] [AspNetCompatibilityRequirements(RequirementsMode = AspNetCompatibilityRequirementsMode.Allowed)] public class WcfService { [OperationContract] [WebGet] public Stream HelloWorld() { var data = Encoding.Unicode.GetBytes("Hello World " + DateTime.Now.ToString()); var ms = new MemoryStream(data); // Add your operation implementation here return ms; } [OperationContract] [WebGet(ResponseFormat=WebMessageFormat.Json,BodyStyle=WebMessageBodyStyle.WrappedRequest)] public Person HelloWorldJson() { // Add your operation implementation here return new Person(); } } For WCF REST all I have to do is add a method with the Person result type. ASP.NET MVC public class MvcPerformanceController : Controller { // // GET: /MvcPerformance/ public ActionResult Index() { return View(); } public ActionResult HelloWorldCode() { return new ContentResult() { Content = "Hello World. Time is: " + DateTime.Now.ToString() }; } public JsonResult HelloWorldJson() { return Json(new Person(), JsonRequestBehavior.AllowGet); } } For MVC all I have to do for a JSON response is return a JSON result. ASP.NET internally uses JavaScriptSerializer. ASP.NET WebAPI public class WebApiPerformanceController : ApiController { [HttpGet] public HttpResponseMessage HelloWorldCode() { return new HttpResponseMessage() { Content = new StringContent("Hello World. Time is: " + DateTime.Now.ToString(), Encoding.UTF8, "text/plain") }; } [HttpGet] public Person HelloWorldJson() { return new Person(); } [HttpGet] public HttpResponseMessage HelloWorldJson2() { var response = new HttpResponseMessage(HttpStatusCode.OK); response.Content = new ObjectContent<Person>(new Person(), GlobalConfiguration.Configuration.Formatters.JsonFormatter); return response; } } Testing and Results To run these data requests I used the following ab.exe commands:REM JSON RESPONSES ab.exe -n100000 -c20 http://localhost/aspnetperf/Handler.ashx?action=json > HandlerJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/MvcPerformance/HelloWorldJson > MvcJson.txt ab.exe -n100000 -c20 http://localhost/aspnetperf/api/HelloWorldJson > WebApiJson.txt ab.exe -n100000 -c20 http://localhost/AspNetPerf/WcfService.svc/HelloWorldJson > WcfJson.txt The results from this test run are a bit interesting in that the WebAPI test improved performance significantly over returning plain string content. Here are the results: The performance for each technology drops a little bit except for WebAPI which is up quite a bit! From this test it appears that WebAPI is actually significantly better performing returning a JSON response, rather than a plain string response. Snag with Apache Benchmark and 'Length Failures' I ran into a little snag with Apache Benchmark, which was reporting failures for my Web API requests when serializing. As the graph shows performance improved significantly from with JSON results from 5580 to 6530 or so which is a 15% improvement (while all others slowed down by 3-8%). However, I was skeptical at first because the WebAPI test reports showed a bunch of errors on about 10% of the requests. Check out this report: Notice the Failed Request count. What the hey? Is WebAPI failing on roughly 10% of requests when sending JSON? Turns out: No it's not! But it took some sleuthing to figure out why it reports these failures. At first I thought that Web API was failing, and so to make sure I re-ran the test with Fiddler attached and runiisning the ab.exe test by using the -X switch: ab.exe -n100 -c10 -X localhost:8888 http://localhost/aspnetperf/api/HelloWorldJson which showed that indeed all requests where returning proper HTTP 200 results with full content. However ab.exe was reporting the errors. After some closer inspection it turned out that the dates varying in size altered the response length in dynamic output. For example: these two results: {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.841926-10:00"} {"Id":10,"Name":"Rick","Entered":"2012-09-04T10:57:24.8519262-10:00"} are different in length for the number which results in 68 and 69 bytes respectively. The same URL produces different result lengths which is what ab.exe reports. I didn't notice at first bit the same is happening when running the ASHX handler with JSON.NET result since it uses the same serializer that varies the milliseconds. Moral: You can typically ignore Length failures in Apache Benchmark and when in doubt check the actual output with Fiddler. Note that the other failure values are accurate though. Another interesting Side Note: Perf drops over Time As I was running these tests repeatedly I was finding that performance steadily dropped from a startup peak to a 10-15% lower stable level. IOW, with Web API I'd start out with around 6500 req/sec and in subsequent runs it keeps dropping until it would stabalize somewhere around 5900 req/sec occasionally jumping lower. For these tests this is why I did the IIS RESET and warm up for individual tests. This is a little puzzling. Looking at Process Monitor while the test are running memory very quickly levels out as do handles and threads, on the first test run. Subsequent runs everything stays stable, but the performance starts going downwards. This applies to all the technologies - Handlers, Web Forms, MVC, Web API - curious to see if others test this and see similar results. Doing an IISRESET then resets everything and performance starts off at peak again… Summary As I stated at the outset, these were informal to satiate my curiosity not to prove that any technology is better or even faster than another. While there clearly are differences in performance the differences (other than WCF REST which was by far the slowest and the raw handler which was by far the highest) are relatively minor, so there is no need to feel that any one technology is a runaway standout in raw performance. Choosing a technology is about more than pure performance but also about the adequateness for the job and the easy of implementation. The strengths of each technology will make for any minor performance difference we see in these tests. However, to me it's important to get an occasional reality check and compare where new technologies are heading. Often times old stuff that's been optimized and designed for a time of less horse power can utterly blow the doors off newer tech and simple checks like this let you compare. Luckily we're seeing that much of the new stuff performs well even in V1.0 which is great. To me it was very interesting to see Web API perform relatively badly with plain string content, which originally led me to think that Web API might not be properly optimized just yet. For those that caught my Tweets late last week regarding WebAPI's slow responses was with String content which is in fact considerably slower. Luckily where it counts with serialized JSON and XML WebAPI actually performs better. But I do wonder what would make generic string content slower than serialized code? This stresses another point: Don't take a single test as the final gospel and don't extrapolate out from a single set of tests. Certainly Twitter can make you feel like a fool when you post something immediate that hasn't been fleshed out a little more <blush>. Egg on my face. As a result I ended up screwing around with this for a few hours today to compare different scenarios. Well worth the time… I hope you found this useful, if not for the results, maybe for the process of quickly testing a few requests for performance and charting out a comparison. Now onwards with more serious stuff… Resources Source Code on GitHub Apache HTTP Server Project (ab.exe is part of the binary distribution)© Rick Strahl, West Wind Technologies, 2005-2012Posted in ASP.NET Web Api Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article

ANTS Memory Profiler 7.0 Review

- by Michael B. McLaughlin

(This is my first review as a part of the GeeksWithBlogs.net Influencers program. It’s a program in which I (and the others who have been selected for it) get the opportunity to check out new products and services and write reviews about them. We don’t get paid for this, but we do generally get to keep a copy of the software or retain an account for some period of time on the service that we review. In this case I received a copy of Red Gate Software’s ANTS Memory Profiler 7.0, which was released in January. I don’t have any upgrade rights nor is my review guided, restrained, influenced, or otherwise controlled by Red Gate or anyone else. But I do get to keep the software license. I will always be clear about what I received whenever I do a review – I leave it up to you to decide whether you believe I can be objective. I believe I can be. If I used something and really didn’t like it, keeping a copy of it wouldn’t be worth anything to me. In that case though, I would simply uninstall/deactivate/whatever the software or service and tell the company what I didn’t like about it so they could (hopefully) make it better in the future. I don’t think it’d be polite to write up a terrible review, nor do I think it would be a particularly good use of my time. There are people who get paid for a living to review things, so I leave it to them to tell you what they think is bad and why. I’ll only spend my time telling you about things I think are good.) Overview of Common .NET Memory Problems When coming to land of managed memory from the wilds of unmanaged code, it’s easy to say to one’s self, “Wow! Now I never have to worry about memory problems again!” But this simply isn’t true. Managed code environments, such as .NET, make many, many things easier. You will never have to worry about memory corruption due to a bad pointer, for example (unless you’re working with unsafe code, of course). But managed code has its own set of memory concerns. For example, failing to unsubscribe from events when you are done with them leaves the publisher of an event with a reference to the subscriber. If you eliminate all your own references to the subscriber, then that memory is effectively lost since the GC won’t delete it because of the publishing object’s reference. When the publishing object itself becomes subject to garbage collection then you’ll get that memory back finally, but that could take a very long time depending of the life of the publisher. Another common source of resource leaks is failing to properly release unmanaged resources. When writing a class that contains members that hold unmanaged resources (e.g. any of the Stream-derived classes, IsolatedStorageFile, most classes ending in “Reader” or “Writer”), you should always implement IDisposable, making sure to use a properly written Dispose method. And when you are using an instance of a class that implements IDisposable, you should always make sure to use a 'using' statement in order to ensure that the object’s unmanaged resources are disposed of properly. (A ‘using’ statement is a nicer, cleaner looking, and easier to use version of a try-finally block. The compiler actually translates it as though it were a try-finally block. Note that Code Analysis warning 2202 (CA2202) will often be triggered by nested using blocks. A properly written dispose method ensures that it only runs once such that calling dispose multiple times should not be a problem. Nonetheless, CA2202 exists and if you want to avoid triggering it then you should write your code such that only the innermost IDisposable object uses a ‘using’ statement, with any outer code making use of appropriate try-finally blocks instead). Then, of course, there are situations where you are operating in a memory-constrained environment or else you want to limit or even eliminate allocations within a certain part of your program (e.g. within the main game loop of an XNA game) in order to avoid having the GC run. On the Xbox 360 and Windows Phone 7, for example, for every 1 MB of heap allocations you make, the GC runs; the added time of a GC collection can cause a game to drop frames or run slowly thereby making it look bad. Eliminating allocations (or else minimizing them and calling an explicit Collect at an appropriate time) is a common way of avoiding this (the other way is to simplify your heap so that the GC’s latency is low enough not to cause performance issues). ANTS Memory Profiler 7.0 When the opportunity to review Red Gate’s recently released ANTS Memory Profiler 7.0 arose, I jumped at it. In order to review it, I was given a free copy (which does not include upgrade rights for future versions) which I am allowed to keep. For those of you who are familiar with ANTS Memory Profiler, you can find a list of new features and enhancements here. If you are an experienced .NET developer who is familiar with .NET memory management issues, ANTS Memory Profiler is great. More importantly still, if you are new to .NET development or you have no experience or limited experience with memory profiling, ANTS Memory Profiler is awesome. From the very beginning, it guides you through the process of memory profiling. If you’re experienced and just want dive in however, it doesn’t get in your way. The help items GAHSFLASHDAJLDJA are well designed and located right next to the UI controls so that they are easy to find without being intrusive. When you first launch it, it presents you with a “Getting Started” screen that contains links to “Memory profiling video tutorials”, “Strategies for memory profiling”, and the “ANTS Memory Profiler forum”. I’m normally the kind of person who looks at a screen like that only to find the “Don’t show this again” checkbox. Since I was doing a review, though, I decided I should examine them. I was pleasantly surprised. The overview video clocks in at three minutes and fifty seconds. It begins by showing you how to get started profiling an application. It explains that profiling is done by taking memory snapshots periodically while your program is running and then comparing them. ANTS Memory Profiler (I’m just going to call it “ANTS MP” from here) analyzes these snapshots in the background while your application is running. It briefly mentions a new feature in Version 7, a new API that give you the ability to trigger snapshots from within your application’s source code (more about this below). You can also, and this is the more common way you would do it, take a memory snapshot at any time from within the ANTS MP window by clicking the “Take Memory Snapshot” button in the upper right corner. The overview video goes on to demonstrate a basic profiling session on an application that pulls information from a database and displays it. It shows how to switch which snapshots you are comparing, explains the different sections of the Summary view and what they are showing, and proceeds to show you how to investigate memory problems using the “Instance Categorizer” to track the path from an object (or set of objects) to the GC’s root in order to find what things along the path are holding a reference to it/them. For a set of objects, you can then click on it and get the “Instance List” view. This displays all of the individual objects (including their individual sizes, values, etc.) of that type which share the same path to the GC root. You can then click on one of the objects to generate an “Instance Retention Graph” view. This lets you track directly up to see the reference chain for that individual object. In the overview video, it turned out that there was an event handler which was holding on to a reference, thereby keeping a large number of strings that should have been freed in memory. Lastly the video shows the “Class List” view, which lets you dig in deeply to find problems that might not have been clear when following the previous workflow. Once you have at least one memory snapshot you can begin analyzing. The main interface is in the “Analysis” tab. You can also switch to the “Session Overview” tab, which gives you several bar charts highlighting basic memory data about the snapshots you’ve taken. If you hover over the individual bars (and the individual colors in bars that have more than one), you will see a detailed text description of what the bar is representing visually. The Session Overview is good for a quick summary of memory usage and information about the different heaps. You are going to spend most of your time in the Analysis tab, but it’s good to remember that the Session Overview is there to give you some quick feedback on basic memory usage stats. As described above in the summary of the overview video, there is a certain natural workflow to the Analysis tab. You’ll spin up your application and take some snapshots at various times such as before and after clicking a button to open a window or before and after closing a window. Taking these snapshots lets you examine what is happening with memory. You would normally expect that a lot of memory would be freed up when closing a window or exiting a document. By taking snapshots before and after performing an action like that you can see whether or not the memory is really being freed. If you already know an area that’s giving you trouble, you can run your application just like normal until just before getting to that part and then you can take a few strategic snapshots that should help you pin down the problem. Something the overview didn’t go into is how to use the “Filters” section at the bottom of ANTS MP together with the Class List view in order to narrow things down. The video tutorials page has a nice 3 minute intro video called “How to use the filters”. It’s a nice introduction and covers some of the basics. I’m going to cover a bit more because I think they’re a really neat, really helpful feature. Large programs can bring up thousands of classes. Even simple programs can instantiate far more classes than you might realize. In a basic .NET 4 WPF application for example (and when I say basic, I mean just MainWindow.xaml with a button added to it), the unfiltered Class List view will have in excess of 1000 classes (my simple test app had anywhere from 1066 to 1148 classes depending on which snapshot I was using as the “Current” snapshot). This is amazing in some ways as it shows you how in stark detail just how immensely powerful the WPF framework is. But hunting through 1100 classes isn’t productive, no matter how cool it is that there are that many classes instantiated and doing all sorts of awesome things. Let’s say you wanted to examine just the classes your application contains source code for (in my simple example, that would be the MainWindow and App). Under “Basic Filters”, click on “Classes with source” under “Show only…”. Voilà. Down from 1070 classes in the snapshot I was using as “Current” to 2 classes. If you then click on a class’s name, it will show you (to the right of the class name) two little icon buttons. Hover over them and you will see that you can click one to view the Instance Categorizer for the class and another to view the Instance List for the class. You can also show classes based on which heap they live on. If you chose both a Baseline snapshot and a Current snapshot then you can use the “Comparing snapshots” filters to show only: “New objects”; “Surviving objects”; “Survivors in growing classes”; or “Zombie objects” (if you aren’t sure what one of these means, you can click the helpful “?” in a green circle icon to bring up a popup that explains them and provides context). Remember that your selection(s) under the “Show only…” heading will still apply, so you should update those selections to make sure you are seeing the view you want. There are also links under the “What is my memory problem?” heading that can help you diagnose the problems you are seeing including one for “I don’t know which kind I have” for situations where you know generally that your application has some problems but aren’t sure what the behavior you have been seeing (OutOfMemoryExceptions, continually growing memory usage, larger memory use than expected at certain points in the program). The Basic Filters are not the only filters there are. “Filter by Object Type” gives you the ability to filter by: “Objects that are disposable”; “Objects that are/are not disposed”; “Objects that are/are not GC roots” (GC roots are things like static variables); and “Objects that implement _______”. “Objects that implement” is particularly neat. Once you check the box, you can then add one or more classes and interfaces that an object must implement in order to survive the filtering. Lastly there is “Filter by Reference”, which gives you the option to pare down the list based on whether an object is “Kept in memory exclusively by” a particular item, a class/interface, or a namespace; whether an object is “Referenced by” one or more of those choices; and whether an object is “Never referenced by” one or more of those choices. Remember that filtering is cumulative, so anything you had set in one of the filter sections still remains in effect unless and until you go back and change it. There’s quite a bit more to ANTS MP – it’s a very full featured product – but I think I touched on all of the most significant pieces. You can use it to debug: a .NET executable; an ASP.NET web application (running on IIS); an ASP.NET web application (running on Visual Studio’s built-in web development server); a Silverlight 4 browser application; a Windows service; a COM+ server; and even something called an XBAP (local XAML browser application). You can also attach to a .NET 4 process to profile an application that’s already running. The startup screen also has a large number of “Charting Options” that let you adjust which statistics ANTS MP should collect. The default selection is a good, minimal set. It’s worth your time to browse through the charting options to examine other statistics that may also help you diagnose a particular problem. The more statistics ANTS MP collects, the longer it will take to collect statistics. So just turning everything on is probably a bad idea. But the option to selectively add in additional performance counters from the extensive list could be a very helpful thing for your memory profiling as it lets you see additional data that might provide clues about a particular problem that has been bothering you. ANTS MP integrates very nicely with all versions of Visual Studio that support plugins (i.e. all of the non-Express versions). Just note that if you choose “Profile Memory” from the “ANTS” menu that it will launch profiling for whichever project you have set as the Startup project. One quick tip from my experience so far using ANTS MP: if you want to properly understand your memory usage in an application you’ve written, first create an “empty” version of the type of project you are going to profile (a WPF application, an XNA game, etc.) and do a quick profiling session on that so that you know the baseline memory usage of the framework itself. By “empty” I mean just create a new project of that type in Visual Studio then compile it and run it with profiling – don’t do anything special or add in anything (except perhaps for any external libraries you’re planning to use). The first thing I tried ANTS MP out on was a demo XNA project of an editor that I’ve been working on for quite some time that involves a custom extension to XNA’s content pipeline. The first time I ran it and saw the unmanaged memory usage I was convinced I had some horrible bug that was creating extra copies of texture data (the demo project didn’t have a lot of texture data so when I saw a lot of unmanaged memory I instantly figured I was doing something wrong). Then I thought to run an empty project through and when I saw that the amount of unmanaged memory was virtually identical, it dawned on me that the CLR itself sits in unmanaged memory and that (thankfully) there was nothing wrong with my code! Quite a relief. Earlier, when discussing the overview video, I mentioned the API that lets you take snapshots from within your application. I gave it a quick trial and it’s very easy to integrate and make use of and is a really nice addition (especially for projects where you want to know what, if any, allocations there are in a specific, complicated section of code). The only concern I had was that if I hadn’t watched the overview video I might never have known it existed. Even then it took me five minutes of hunting around Red Gate’s website before I found the “Taking snapshots from your code" article that explains what DLL you need to add as a reference and what method of what class you should call in order to take an automatic snapshot (including the helpful warning to wrap it in a try-catch block since, under certain circumstances, it can raise an exception, such as trying to call it more than 5 times in 30 seconds. The difficulty in discovering and then finding information about the automatic snapshots API was one thing I thought could use improvement. Another thing I think would make it even better would be local copies of the webpages it links to. Although I’m generally always connected to the internet, I imagine there are more than a few developers who aren’t or who are behind very restrictive firewalls. For them (and for me, too, if my internet connection happens to be down), it would be nice to have those documents installed locally or to have the option to download an additional “documentation” package that would add local copies. Another thing that I wish could be easier to manage is the Filters area. Finding and setting individual filters is very easy as is understanding what those filter do. And breaking it up into three sections (basic, by object, and by reference) makes sense. But I could easily see myself running a long profiling session and forgetting that I had set some filter a long while earlier in a different filter section and then spending quite a bit of time trying to figure out why some problem that was clearly visible in the data wasn’t showing up in, e.g. the instance list before remembering to check all the filters for that one setting that was only culling a few things from view. Some sort of indicator icon next to the filter section names that appears you have at least one filter set in that area would be a nice visual clue to remind me that “oh yeah, I told it to only show objects on the Gen 2 heap! That’s why I’m not seeing those instances of the SuperMagic class!” Something that would be nice (but that Red Gate cannot really do anything about) would be if this could be used in Windows Phone 7 development. If Microsoft and Red Gate could work together to make this happen (even if just on the WP7 emulator), that would be amazing. Especially given the memory constraints that apps and games running on mobile devices need to work within, a good memory profiler would be a phenomenally helpful tool. If anyone at Microsoft reads this, it’d be really great if you could make something like that happen. Perhaps even a (subsidized) custom version just for WP7 development. (For XNA games, of course, you can create a Windows version of the game and use ANTS MP on the Windows version in order to get a better picture of your memory situation. For Silverlight on WP7, though, there’s quite a bit of educated guess work and WeakReference creation followed by forced collections in order to find the source of a memory problem.) The only other thing I found myself wanting was a “Back” button. Between my Windows Phone 7, Zune, and other things, I’ve grown very used to having a “back stack” that lets me just navigate back to where I came from. The ANTS MP interface is surprisingly easy to use given how much it lets you do, and once you start using it for any amount of time, you learn all of the different areas such that you know where to go. And it does remember the state of the areas you were previously in, of course. So if you go to, e.g., the Instance Retention Graph from the Class List and then return back to the Class List, it will remember which class you had selected and all that other state information. Still, a “Back” button would be a welcome addition to a future release. Bottom Line ANTS Memory Profiler is not an inexpensive tool. But my time is valuable. I can easily see ANTS MP saving me enough time tracking down memory problems to justify it on a cost basis. More importantly to me, knowing what is happening memory-wise in my programs and having the confidence that my code doesn’t have any hidden time bombs in it that will cause it to OOM if I leave it running for longer than I do when I spin it up real quickly for debugging or just to see how a new feature looks and feels is a good feeling. It’s a feeling that I like having and want to continue to have. I got the current version for free in order to review it. Having done so, I’ve now added it to my must-have tools and will gladly lay out the money for the next version when it comes out. It has a 14 day free trial, so if you aren’t sure if it’s right for you or if you think it seems interesting but aren’t really sure if it’s worth shelling out the money for it, give it a try.

Read the article

Microsoft Introduces WebMatrix

- by Rick Strahl

originally published in CoDe Magazine Editorial Microsoft recently released the first CTP of a new development environment called WebMatrix, which along with some of its supporting technologies are squarely aimed at making the Microsoft Web Platform more approachable for first-time developers and hobbyists. But in the process, it also provides some updated technologies that can make life easier for existing .NET developers. Let’s face it: ASP.NET development isn’t exactly trivial unless you already have a fair bit of familiarity with sophisticated development practices. Stick a non-developer in front of Visual Studio .NET or even the Visual Web Developer Express edition and it’s not likely that the person in front of the screen will be very productive or feel inspired. Yet other technologies like PHP and even classic ASP did provide the ability for non-developers and hobbyists to become reasonably proficient in creating basic web content quickly and efficiently. WebMatrix appears to be Microsoft’s attempt to bring back some of that simplicity with a number of technologies and tools. The key is to provide a friendly and fully self-contained development environment that provides all the tools needed to build an application in one place, as well as tools that allow publishing of content and databases easily to the web server. WebMatrix is made up of several components and technologies: IIS Developer Express IIS Developer Express is a new, self-contained development web server that is fully compatible with IIS 7.5 and based on the same codebase that IIS 7.5 uses. This new development server replaces the much less compatible Cassini web server that’s been used in Visual Studio and the Express editions. IIS Express addresses a few shortcomings of the Cassini server such as the inability to serve custom ISAPI extensions (i.e., things like PHP or ASP classic for example), as well as not supporting advanced authentication. IIS Developer Express provides most of the IIS 7.5 feature set providing much better compatibility between development and live deployment scenarios. SQL Server Compact 4.0 Database access is a key component for most web-driven applications, but on the Microsoft stack this has mostly meant you have to use SQL Server or SQL Server Express. SQL Server Compact is not new-it’s been around for a few years, but it’s been severely hobbled in the past by terrible tool support and the inability to support more than a single connection in Microsoft’s attempt to avoid losing SQL Server licensing. The new release of SQL Server Compact 4.0 supports multiple connections and you can run it in ASP.NET web applications simply by installing an assembly into the bin folder of the web application. In effect, you don’t have to install a special system configuration to run SQL Compact as it is a drop-in database engine: Copy the small assembly into your BIN folder (or from the GAC if installed fully), create a connection string against a local file-based database file, and then start firing SQL requests. Additionally WebMatrix includes nice tools to edit the database tables and files, along with tools to easily upsize (and hopefully downsize in the future) to full SQL Server. This is a big win, pending compatibility and performance limits. In my simple testing the data engine performed well enough for small data sets. This is not only useful for web applications, but also for desktop applications for which a fully installed SQL engine like SQL Server would be overkill. Having a local data store in those applications that can potentially be accessed by multiple users is a welcome feature. ASP.NET Razor View Engine What? Yet another native ASP.NET view engine? We already have Web Forms and various different flavors of using that view engine with Web Forms and MVC. Do we really need another? Microsoft thinks so, and Razor is an implementation of a lightweight, script-only view engine. Unlike the Web Forms view engine, Razor works only with inline code, snippets, and markup; therefore, it is more in line with current thinking of what a view engine should represent. There’s no support for a “page model” or any of the other Web Forms features of the full-page framework, but just a lightweight scripting engine that works with plain markup plus embedded expressions and code. The markup syntax for Razor is geared for minimal typing, plus some progressive detection of where a script block/expression starts and ends. This results in a much leaner syntax than the typical ASP.NET Web Forms alligator (<% %>) tags. Razor uses the @ sign plus standard C# (or Visual Basic) block syntax to delineate code snippets and expressions. Here’s a very simple example of what Razor markup looks like along with some comment annotations: <!DOCTYPE html> <html> <head> <title></title> </head> <body> <h1>Razor Test</h1>  @DateTime.Now <hr />  @DateTime.Now.ToString("T")  @{ List<string> names = new List<string>(); names.Add("Rick"); names.Add("Markus"); names.Add("Claudio"); names.Add("Kevin"); }  <ul> @foreach(string name in names){ <li>@name</li> } </ul>  @if(true) {  <text> true </text>; } else {  <text> false </text>; } </body> </html> Like the Web Forms view engine, Razor parses pages into code, and then executes that run-time compiled code. Effectively a “page” becomes a code file with markup becoming literal text written into the Response stream, code snippets becoming raw code, and expressions being written out with Response.Write(). The code generated from Razor doesn’t look much different from similar Web Forms code that only uses script tags; so although the syntax may look different, the operational model is fairly similar to the Web Forms engine minus the overhead of the large Page object model. However, there are differences: -Razor pages are based on a new base class, Microsoft.WebPages.WebPage, which is hosted in the Microsoft.WebPages assembly that houses all the Razor engine parsing and processing logic. Browsing through the assembly (in the generated ASP.NET Temporary Files folder or GAC) will give you a good idea of the functionality that Razor provides. If you look closely, a lot of the feature set matches ASP.NET MVC’s view implementation as well as many of the helper classes found in MVC. It’s not hard to guess the motivation for this sort of view engine: For beginning developers the simple markup syntax is easier to work with, although you obviously still need to have some understanding of the .NET Framework in order to create dynamic content. The syntax is easier to read and grok and much shorter to type than ASP.NET alligator tags (<% %>) and also easier to understand aesthetically what’s happening in the markup code. Razor also is a better fit for Microsoft’s vision of ASP.NET MVC: It’s a new view engine without the baggage of Web Forms attached to it. The engine is more lightweight since it doesn’t carry all the features and object model of Web Forms with it and it can be instantiated directly outside of the HTTP environment, which has been rather tricky to do for the Web Forms view engine. Having a standalone script parser is a huge win for other applications as well – it makes it much easier to create script or meta driven output generators for many types of applications from code/screen generators, to simple form letters to data merging applications with user customizability. For me personally this is very useful side effect and who knows maybe Microsoft will actually standardize they’re scripting engines (die T4 die!) on this engine. Razor also better fits the “view-based” approach where the view is supposed to be mostly a visual representation that doesn’t hold much, if any, code. While you can still use code, the code you do write has to be self-contained. Overall I wouldn’t be surprised if Razor will become the new standard view engine for MVC in the future – and in fact there have been announcements recently that Razor will become the default script engine in ASP.NET MVC 3.0. Razor can also be used in existing Web Forms and MVC applications, although that’s not working currently unless you manually configure the script mappings and add the appropriate assemblies. It’s possible to do it, but it’s probably better to wait until Microsoft releases official support for Razor scripts in Visual Studio. Once that happens, you can simply drop .cshtml and .vbhtml pages into an existing ASP.NET project and they will work side by side with classic ASP.NET pages. WebMatrix Development Environment To tie all of these three technologies together, Microsoft is shipping WebMatrix with an integrated development environment. An integrated gallery manager makes it easy to download and load existing projects, and then extend them with custom functionality. It seems to be a prominent goal to provide community-oriented content that can act as a starting point, be it via a custom templates or a complete standard application. The IDE includes a project manager that works with a single project and provides an integrated IDE/editor for editing the .cshtml and .vbhtml pages. A run button allows you to quickly run pages in the project manager in a variety of browsers. There’s no debugging support for code at this time. Note that Razor pages don’t require explicit compilation, so making a change, saving, and then refreshing your page in the browser is all that’s needed to see changes while testing an application locally. It’s essentially using the auto-compiling Web Project that was introduced with .NET 2.0. All code is compiled during run time into dynamically created assemblies in the ASP.NET temp folder. WebMatrix also has PHP Editing support with syntax highlighting. You can load various PHP-based applications from the WebMatrix Web Gallery directly into the IDE. Most of the Web Gallery applications are ready to install and run without further configuration, with Wizards taking you through installation of tools, dependencies, and configuration of the database as needed. WebMatrix leverages the Web Platform installer to pull the pieces down from websites in a tight integration of tools that worked nicely for the four or five applications I tried this out on. Click a couple of check boxes and fill in a few simple configuration options and you end up with a running application that’s ready to be customized. Nice! You can easily deploy completed applications via WebDeploy (to an IIS server) or FTP directly from within the development environment. The deploy tool also can handle automatically uploading and installing the database and all related assemblies required, making deployment a simple one-click install step. Simplified Database Access The IDE contains a database editor that can edit SQL Compact and SQL Server databases. There is also a Database helper class that facilitates database access by providing easy-to-use, high-level query execution and iteration methods: @{ var db = Database.OpenFile("FirstApp.sdf"); string sql = "select * from customers where Id > @0"; } <ul> @foreach(var row in db.Query(sql,1)){ <li>@row.FirstName @row.LastName</li> } </ul> The query function takes a SQL statement plus any number of positional (@0,@1 etc.) SQL parameters by simple values. The result is returned as a collection of rows which in turn have a row object with dynamic properties for each of the columns giving easy (though untyped) access to each of the fields. Likewise Execute and ExecuteNonQuery allow execution of more complex queries using similar parameter passing schemes. Note these queries use string-based queries rather than LINQ or Entity Framework’s strongly typed LINQ queries. While this may seem like a step back, it’s also in line with the expectations of non .NET script developers who are quite used to writing and using SQL strings in code rather than using OR/M frameworks. The only question is why was something not included from the beginning in .NET and Microsoft made developers build custom implementations of these basic building blocks. The implementation looks a lot like a DataTable-style data access mechanism, but to be fair, this is a common approach in scripting languages. This type of syntax that uses simple, static, data object methods to perform simple data tasks with one line of code are common in scripting languages and are a good match for folks working in PHP/Python, etc. Seems like Microsoft has taken great advantage of .NET 4.0’s dynamic typing to provide this sort of interface for row iteration where each row has properties for each field. FWIW, all the examples demonstrate using local SQL Compact files - I was unable to get a SQL Server connection string to work with the Database class (the connection string wasn’t accepted). However, since the code in the page is still plain old .NET, you can easily use standard ADO.NET code or even LINQ or Entity Framework models that are created outside of WebMatrix in separate assemblies as required. The good the bad the obnoxious - It’s still .NET The beauty (or curse depending on how you look at it :)) of Razor and the compilation model is that, behind it all, it’s still .NET. Although the syntax may look foreign, it’s still all .NET behind the scenes. You can easily access existing tools, helpers, and utilities simply by adding them to the project as references or to the bin folder. Razor automatically recognizes any assembly reference from assemblies in the bin folder. In the default configuration, Microsoft provides a host of helper functions in a Microsoft.WebPages assembly (check it out in the ASP.NET temp folder for your application), which includes a host of HTML Helpers. If you’ve used ASP.NET MVC before, a lot of the helpers should look familiar. Documentation at the moment is sketchy-there’s a very rough API reference you can check out here: http://www.asp.net/webmatrix/tutorials/asp-net-web-pages-api-reference Who needs WebMatrix? Uhm… good Question Clearly Microsoft is trying hard to create an environment with WebMatrix that is easy to use for newbie developers. The goal seems to be simplicity in providing a minimal development environment and an easy-to-use script engine/language that makes it easy to get started with. There’s also some focus on community features that can be used as starting points, such as Web Gallery applications and templates. The community features in particular are very nice and something that would be nice to eventually see in Visual Studio as well. The question is whether this is too little too late. Developers who have been clamoring for a simpler development environment on the .NET stack have mostly left for other simpler platforms like PHP or Python which are catering to the down and dirty developer. Microsoft will be hard pressed to win those folks-and other hardcore PHP developers-back. Regardless of how much you dress up a script engine fronted by the .NET Framework, it’s still the .NET Framework and all the complexity that drives it. While .NET is a fine solution in its breadth and features once you get a basic handle on the core features, the bar of entry to being productive with the .NET Framework is still pretty high. The MVC style helpers Microsoft provides are a good step in the right direction, but I suspect it’s not enough to shield new developers from having to delve much deeper into the Framework to get even basic applications built. Razor and its helpers is trying to make .NET more accessible but the reality is that in order to do useful stuff that goes beyond the handful of simple helpers you still are going to have to write some C# or VB or other .NET code. If the target is a hobby/amateur/non-programmer the learning curve isn’t made any easier by WebMatrix it’s just been shifted a tad bit further along in your development endeavor when you run out of canned components that are supplied either by Microsoft or the community. The database helpers are interesting and actually I’ve heard a lot of discussion from various developers who’ve been resisting .NET for a really long time perking up at the prospect of easier data access in .NET than the ridiculous amount of code it takes to do even simple data access with raw ADO.NET. It seems sad that such a simple concept and implementation should trigger this sort of response (especially since it’s practically trivial to create helpers like these or pick them up from countless libraries available), but there it is. It also shows that there are plenty of developers out there who are more interested in ‘getting stuff done’ easily than necessarily following the latest and greatest practices which are overkill for many development scenarios. Sometimes it seems that all of .NET is focused on the big life changing issues of development, rather than the bread and butter scenarios that many developers are interested in to get their work accomplished. And that in the end may be WebMatrix’s main raison d'être: To bring some focus back at Microsoft that simpler and more high level solutions are actually needed to appeal to the non-high end developers as well as providing the necessary tools for the high end developers who want to follow the latest and greatest trends. The current version of WebMatrix hits many sweet spots, but it also feels like it has a long way to go before it really can be a tool that a beginning developer or an accomplished developer can feel comfortable with. Although there are some really good ideas in the environment (like the gallery for downloading apps and components) which would be a great addition for Visual Studio as well, the rest of the development environment just feels like crippleware with required functionality missing especially debugging and Intellisense, but also general editor support. It’s not clear whether these are because the product is still in an early alpha release or whether it’s simply designed that way to be a really limited development environment. While simple can be good, nobody wants to feel left out when it comes to necessary tool support and WebMatrix just has that left out feeling to it. If anything WebMatrix’s technology pieces (which are really independent of the WebMatrix product) are what are interesting to developers in general. The compact IIS implementation is a nice improvement for development scenarios and SQL Compact 4.0 seems to address a lot of concerns that people have had and have complained about for some time with previous SQL Compact implementations. By far the most interesting and useful technology though seems to be the Razor view engine for its light weight implementation and it’s decoupling from the ASP.NET/HTTP pipeline to provide a standalone scripting/view engine that is pluggable. The first winner of this is going to be ASP.NET MVC which can now have a cleaner view model that isn’t inconsistent due to the baggage of non-implemented WebForms features that don’t work in MVC. But I expect that Razor will end up in many other applications as a scripting and code generation engine eventually. Visual Studio integration for Razor is currently missing, but is promised for a later release. The ASP.NET MVC team has already mentioned that Razor will eventually become the default MVC view engine, which will guarantee continued growth and development of this tool along those lines. And the Razor engine and support tools actually inherit many of the features that MVC pioneered, so there’s some synergy flowing both ways between Razor and MVC. As an existing ASP.NET developer who’s already familiar with Visual Studio and ASP.NET development, the WebMatrix IDE doesn’t give you anything that you want. The tools provided are minimal and provide nothing that you can’t get in Visual Studio today, except the minimal Razor syntax highlighting, so there’s little need to take a step back. With Visual Studio integration coming later there’s little reason to look at WebMatrix for tooling. It’s good to see that Microsoft is giving some thought about the ease of use of .NET as a platform For so many years, we’ve been piling on more and more new features without trying to take a step back and see how complicated the development/configuration/deployment process has become. Sometimes it’s good to take a step - or several steps - back and take another look and realize just how far we’ve come. WebMatrix is one of those reminders and one that likely will result in some positive changes on the platform as a whole. © Rick Strahl, West Wind Technologies, 2005-2010Posted in ASP.NET IIS7

Read the article

Benchmark Linq2SQL, Subsonic2, Subsonic3 - Any other ideas to make them faster ?

- by Aristos

I am working with Subsonic 2 more than 3 years now... After Linq appears and then Subsonic 3, I start thinking about moving to the new Linq futures that are connected to sql. I must say that I start move and port my subsonic 2 with SubSonic 3, and very soon I discover that the speed was so slow thats I didn't believe it - and starts all that tests. Then I test Linq2Sql and see also a delay - compare it with Subsonic 2. My question here is, especial for the linq2sql, and the up-coming dotnet version 4, what else can I do to speed it up ? What else on linq2sql settings, or classes, not on this code that I have used for my messures I place here the project that I make the tests, also the screen shots of the results. How I make the tests - and the accurate of my measures. I use only for my question Google chrome, because its difficult for me to show here a lot of other measures that I have done with more complex programs. This is the most simple one, I just measure the Data Read. How can I prove that. I make a simple Thread.Sleep(10 seconds) and see if I see that 10 seconds on Google Chrome Measure, and yes I see it. here are more test with this Sleep thead to see whats actually Chrome gives. 10 seconds delay 100 ms delay Zero delay There is only a small 15ms thats get on messure, is so small compare it with the rest of my tests that I do not care about. So what I measure I measure just the data read via each method - did not count the data or database delay, or any disk read or anything like that. Later on the image with the result I show that no disk activity exist on the measures See this image to see what really I measure and if this is correct Why I chose this kind of test Its simple, it's real, and it's near my real problem that I found the delay of subsonic 3 in real program with real data. Now lets tests the dals Start by see this image I have 4-5 calls on every method, the one after the other. The results are. For a loop of 100 times, ask for 5 Rows, one not exist, approximatively.. Simple adonet:81ms SubSonic 2 :210ms linq2sql :1.70sec linq2sql using CompiledQuery.Compile :239ms Subsonic 3 :15.00sec (wow - extreme slow) The project http://www.planethost.gr/DalSpeedTests.rar Can any one confirm this benchmark, or make any optimizations to help me out ? Other tests Some one publish here this link http://ormbattle.net/ (and then remove it - don not know why) In this page you can find a really useful advanced tests for all, except subsonic 2 and subsonic 3 that I have here ! Optimizing What I really ask here is if some one can now any trick how to optimize the DALs, not by changing the test code, but by changing the code and the settings on each dal. For example... Optimizing Linq2SQL I start search how to optimize Linq2sql and found this article, and maybe more exist. Finally I make the tricks from that page to run, and optimize the code using them all. The speed was near 1.50sec from 1.70.... big improvement, but still slow. Then I found a different way - same idea article, and wow ! the speed is blow up. Using this trick with CompiledQuery.Compile, the time from 1.5sec is now 239ms. Here is the code for the precompiled... Func<DataClassesDataContext, int, IQueryable<Product>> compiledQuery = CompiledQuery.Compile((DataClassesDataContext meta, int IdToFind) => (from myData in meta.Products where myData.ProductID.Equals(IdToFind) select myData)); StringBuilder Test = new StringBuilder(); int[] MiaSeira = { 5, 6, 10, 100, 7 }; using (DataClassesDataContext context = new DataClassesDataContext()) { context.ObjectTrackingEnabled = false; for (int i = 0; i < 100; i++) { foreach (int EnaID in MiaSeira) { var oFindThat2P = compiledQuery(context, EnaID); foreach (Product One in oFindThat2P) { Test.Append("<br />"); Test.Append(One.ProductName); } } } } Optimizing SubSonic 3 and problems I make many performance profiling, and start change the one after the other and the speed is better but still too slow. I post them on subsonic group but they ignore the problem, they say that everything is fast... Here is some capture of my profiling and delay points inside subsonic source code I have end up that subsonic3 make more call on the structure of the database rather than on data itself. Needs to reconsider the hole way of asking for data, and follow the subsonic2 idea if this is possible. Try to make precompile to subsonic 3 like I did in linq2Sql but fail for the moment... Optimizing SubSonic 2 After I discover that subsonic 3 is extreme slow, I start my checks on subsonic 2 - that I have never done before believing that is fast. (and it is) So its come up with some points that can be faster. For example there are many loops like this ones that actually is slow because of string manipulation and compares inside the loop. I must say to you that this code called million of times ! on a period of few minutes ! of data asking from the program. On small amount of tables and small fields maybe this is not a big think for some people, but on large amount of tables, the delay is even more. So I decide and optimize the subsonic 2 by my self, by replacing the string compares, with number compares! Simple. I do that almost on every point that profiler say that is slow. I change also all small points that can be even a little faster, and disable some not so used thinks. The results, 5% faster on NorthWind database, near 20% faster on my database with 250 tables. That is count with 500ms less in 10 seconds process on northwind, 100ms faster on my database on 500ms process time. I do not have captures to show you for that because I have made them with different code, different time, and track them down on paper. Anyway this is my story and my question on all that, what else do you know to make them even faster... For this measures I have use Subsonic 2.2 optimized by me, Subsonic 3.0.0.3 a little optimized by me, and Dot.Net 3.5

Read the article

actionscript3: reflect-class applied on rotationY

- by algro

Hi, I'm using a class which applies a visual reflection-effect to defined movieclips. I use a reflection-class from here: link to source. It works like a charm except when I apply a rotation to the movieclip. In my case the reflection is still visible but only a part of it. What am I doing wrong? How could I pass/include the rotation to the Reflection-Class ? Thanks in advance! This is how you apply the Reflection Class to your movieclip: var ref_mc:MovieClip = new MoviClip(); addChild(ref_mc); var r1:Reflect = new Reflect({mc:ref_mc, alpha:50, ratio:50,distance:0, updateTime:0,reflectionDropoff:1}); Now I apply a rotation to my movieclip: ref_mc.rotationY = 30; And Here the Reflect-Class: package com.pixelfumes.reflect{ import flash.display.MovieClip; import flash.display.DisplayObject; import flash.display.BitmapData; import flash.display.Bitmap; import flash.geom.Matrix; import flash.display.GradientType; import flash.display.SpreadMethod; import flash.utils.setInterval; import flash.utils.clearInterval; public class Reflect extends MovieClip{ //Created By Ben Pritchard of Pixelfumes 2007 //Thanks to Mim, Jasper, Jason Merrill and all the others who //have contributed to the improvement of this class //static var for the version of this class private static var VERSION:String = "4.0"; //reference to the movie clip we are reflecting private var mc:MovieClip; //the BitmapData object that will hold a visual copy of the mc private var mcBMP:BitmapData; //the BitmapData object that will hold the reflected image private var reflectionBMP:Bitmap; //the clip that will act as out gradient mask private var gradientMask_mc:MovieClip; //how often the reflection should update (if it is video or animated) private var updateInt:Number; //the size the reflection is allowed to reflect within private var bounds:Object; //the distance the reflection is vertically from the mc private var distance:Number = 0; function Reflect(args:Object){ /*the args object passes in the following variables /we set the values of our internal vars to math the args*/ //the clip being reflected mc = args.mc; //the alpha level of the reflection clip var alpha:Number = args.alpha/100; //the ratio opaque color used in the gradient mask var ratio:Number = args.ratio; //update time interval var updateTime:Number = args.updateTime; //the distance at which the reflection visually drops off at var reflectionDropoff:Number = args.reflectionDropoff; //the distance the reflection starts from the bottom of the mc var distance:Number = args.distance; //store width and height of the clip var mcHeight = mc.height; var mcWidth = mc.width; //store the bounds of the reflection bounds = new Object(); bounds.width = mcWidth; bounds.height = mcHeight; //create the BitmapData that will hold a snapshot of the movie clip mcBMP = new BitmapData(bounds.width, bounds.height, true, 0xFFFFFF); mcBMP.draw(mc); //create the BitmapData the will hold the reflection reflectionBMP = new Bitmap(mcBMP); //flip the reflection upside down reflectionBMP.scaleY = -1; //move the reflection to the bottom of the movie clip reflectionBMP.y = (bounds.height*2) + distance; //add the reflection to the movie clip's Display Stack var reflectionBMPRef:DisplayObject = mc.addChild(reflectionBMP); reflectionBMPRef.name = "reflectionBMP"; //add a blank movie clip to hold our gradient mask var gradientMaskRef:DisplayObject = mc.addChild(new MovieClip()); gradientMaskRef.name = "gradientMask_mc"; //get a reference to the movie clip - cast the DisplayObject that is returned as a MovieClip gradientMask_mc = mc.getChildByName("gradientMask_mc") as MovieClip; //set the values for the gradient fill var fillType:String = GradientType.LINEAR; var colors:Array = [0xFFFFFF, 0xFFFFFF]; var alphas:Array = [alpha, 0]; var ratios:Array = [0, ratio]; var spreadMethod:String = SpreadMethod.PAD; //create the Matrix and create the gradient box var matr:Matrix = new Matrix(); //set the height of the Matrix used for the gradient mask var matrixHeight:Number; if (reflectionDropoff<=0) { matrixHeight = bounds.height; } else { matrixHeight = bounds.height/reflectionDropoff; } matr.createGradientBox(bounds.width, matrixHeight, (90/180)*Math.PI, 0, 0); //create the gradient fill gradientMask_mc.graphics.beginGradientFill(fillType, colors, alphas, ratios, matr, spreadMethod); gradientMask_mc.graphics.drawRect(0,0,bounds.width,bounds.height); //position the mask over the reflection clip gradientMask_mc.y = mc.getChildByName("reflectionBMP").y - mc.getChildByName("reflectionBMP").height; //cache clip as a bitmap so that the gradient mask will function gradientMask_mc.cacheAsBitmap = true; mc.getChildByName("reflectionBMP").cacheAsBitmap = true; //set the mask for the reflection as the gradient mask mc.getChildByName("reflectionBMP").mask = gradientMask_mc; //if we are updating the reflection for a video or animation do so here if(updateTime > -1){ updateInt = setInterval(update, updateTime, mc); } } public function setBounds(w:Number,h:Number):void{ //allows the user to set the area that the reflection is allowed //this is useful for clips that move within themselves bounds.width = w; bounds.height = h; gradientMask_mc.width = bounds.width; redrawBMP(mc); } public function redrawBMP(mc:MovieClip):void { // redraws the bitmap reflection - Mim Gamiet [2006] mcBMP.dispose(); mcBMP = new BitmapData(bounds.width, bounds.height, true, 0xFFFFFF); mcBMP.draw(mc); } private function update(mc):void { //updates the reflection to visually match the movie clip mcBMP = new BitmapData(bounds.width, bounds.height, true, 0xFFFFFF); mcBMP.draw(mc); reflectionBMP.bitmapData = mcBMP; } public function destroy():void{ //provides a method to remove the reflection mc.removeChild(mc.getChildByName("reflectionBMP")); reflectionBMP = null; mcBMP.dispose(); clearInterval(updateInt); mc.removeChild(mc.getChildByName("gradientMask_mc")); } } }

Read the article

High Load mysql on Debian server stops every day. Why?

- by Oleg Abrazhaev

I have Debian server with 32 gb memory. And there is apache2, memcached and nginx on this server. Memory load always on maximum. Only 500m free. Most memory leak do MySql. Apache only 70 clients configured, other services small memory usage. When mysql use all memory it stops. And nothing works, need mysql reboot. Mysql configured use maximum 24 gb memory. I have hight weight InnoDB bases. (400000 rows, 30 gb). And on server multithread daemon, that makes many inserts in this tables, thats why InnoDB. There is my mysql config. [mysqld] # # * Basic Settings # default-time-zone = "+04:00" user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp language = /usr/share/mysql/english skip-external-locking default-time-zone='Europe/Moscow' # # Instead of skip-networking the default is now to listen only on # localhost which is more compatible and is not less secure. # # * Fine Tuning # #low_priority_updates = 1 concurrent_insert = ALWAYS wait_timeout = 600 interactive_timeout = 600 #normal key_buffer_size = 2024M #key_buffer_size = 1512M #70% hot cache key_cache_division_limit= 70 #16-32 max_allowed_packet = 32M #1-16M thread_stack = 8M #40-50 thread_cache_size = 50 #orderby groupby sort sort_buffer_size = 64M #same myisam_sort_buffer_size = 400M #temp table creates when group_by tmp_table_size = 3000M #tables in memory max_heap_table_size = 3000M #on disk open_files_limit = 10000 table_cache = 10000 join_buffer_size = 5M # This replaces the startup script and checks MyISAM tables if needed # the first time they are touched myisam-recover = BACKUP #myisam_use_mmap = 1 max_connections = 200 thread_concurrency = 8 # # * Query Cache Configuration # #more ignored query_cache_limit = 50M query_cache_size = 210M #on query cache query_cache_type = 1 # # * Logging and Replication # # Both location gets rotated by the cronjob. # Be aware that this log type is a performance killer. #log = /var/log/mysql/mysql.log # # Error logging goes to syslog. This is a Debian improvement :) # # Here you can see queries with especially long duration log_slow_queries = /var/log/mysql/mysql-slow.log long_query_time = 1 log-queries-not-using-indexes # # The following can be used as easy to replay backup logs or for replication. # note: if you are setting up a replication slave, see README.Debian about # other settings you may need to change. #server-id = 1 #log_bin = /var/log/mysql/mysql-bin.log server-id = 1 log-bin = /var/lib/mysql/mysql-bin #replicate-do-db = gate log-bin-index = /var/lib/mysql/mysql-bin.index log-error = /var/lib/mysql/mysql-bin.err relay-log = /var/lib/mysql/relay-bin relay-log-info-file = /var/lib/mysql/relay-bin.info relay-log-index = /var/lib/mysql/relay-bin.index binlog_do_db = 24avia expire_logs_days = 10 max_binlog_size = 100M read_buffer_size = 4024288 innodb_buffer_pool_size = 5000M innodb_flush_log_at_trx_commit = 2 innodb_thread_concurrency = 8 table_definition_cache = 2000 group_concat_max_len = 16M #binlog_do_db = gate #binlog_ignore_db = include_database_name # # * BerkeleyDB # # Using BerkeleyDB is now discouraged as its support will cease in 5.1.12. #skip-bdb # # * InnoDB # # InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/. # Read the manual for more InnoDB related options. There are many! # You might want to disable InnoDB to shrink the mysqld process by circa 100MB. #skip-innodb # # * Security Features # # Read the manual, too, if you want chroot! # chroot = /var/lib/mysql/ # # For generating SSL certificates I recommend the OpenSSL GUI "tinyca". # # ssl-ca=/etc/mysql/cacert.pem # ssl-cert=/etc/mysql/server-cert.pem # ssl-key=/etc/mysql/server-key.pem [mysqldump] quick quote-names max_allowed_packet = 500M [mysql] #no-auto-rehash # faster start of mysql but no tab completition [isamchk] key_buffer = 32M key_buffer_size = 512M # # * NDB Cluster # # See /usr/share/doc/mysql-server-*/README.Debian for more information. # # The following configuration is read by the NDB Data Nodes (ndbd processes) # not from the NDB Management Nodes (ndb_mgmd processes). # # [MYSQL_CLUSTER] # ndb-connectstring=127.0.0.1 # # * IMPORTANT: Additional settings that can override those from this file! # The files must end with '.cnf', otherwise they'll be ignored. # !includedir /etc/mysql/conf.d/ Please, help me make it stable. Memory used /etc/mysql # free total used free shared buffers cached Mem: 32930800 32766424 164376 0 139208 23829196 -/+ buffers/cache: 8798020 24132780 Swap: 33553328 44660 33508668 Maybe my problem not in memory, but MySQL stops every day. As you can see, cache memory free 24 gb. Thank to Michael Hampton? for correction. Load overage on server 3.5. Maybe hdd or another problem? Maybe my config not optimal for 30gb InnoDB ? I'm already try mysqltuner and tunung-primer.sh , but they marked all green. Mysqltuner output mysqltuner >> MySQLTuner 1.0.1 - Major Hayden <[email protected]> >> Bug reports, feature requests, and downloads at http://mysqltuner.com/ >> Run with '--help' for additional options and output filtering -------- General Statistics -------------------------------------------------- [--] Skipped version check for MySQLTuner script [OK] Currently running supported MySQL version 5.5.24-9-log [OK] Operating on 64-bit architecture -------- Storage Engine Statistics ------------------------------------------- [--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster [--] Data in MyISAM tables: 112G (Tables: 1528) [--] Data in InnoDB tables: 39G (Tables: 340) [--] Data in PERFORMANCE_SCHEMA tables: 0B (Tables: 17) [!!] Total fragmented tables: 344 -------- Performance Metrics ------------------------------------------------- [--] Up for: 8h 18m 33s (14M q [478.333 qps], 259K conn, TX: 9B, RX: 5B) [--] Reads / Writes: 84% / 16% [--] Total buffers: 10.5G global + 81.1M per thread (200 max threads) [OK] Maximum possible memory usage: 26.3G (83% of installed RAM) [OK] Slow queries: 1% (259K/14M) [!!] Highest connection usage: 100% (201/200) [OK] Key buffer size / total MyISAM indexes: 1.5G/5.6G [OK] Key buffer hit rate: 100.0% (6B cached / 1M reads) [OK] Query cache efficiency: 74.3% (8M cached / 11M selects) [OK] Query cache prunes per day: 0 [OK] Sorts requiring temporary tables: 0% (0 temp sorts / 247K sorts) [!!] Joins performed without indexes: 106025 [!!] Temporary tables created on disk: 49% (351K on disk / 715K total) [OK] Thread cache hit rate: 99% (249 created / 259K connections) [!!] Table cache hit rate: 15% (2K open / 13K opened) [OK] Open file limit used: 15% (3K/20K) [OK] Table locks acquired immediately: 99% (4M immediate / 4M locks) [!!] InnoDB data size / buffer pool: 39.4G/5.9G -------- Recommendations ----------------------------------------------------- General recommendations: Run OPTIMIZE TABLE to defragment tables for better performance MySQL started within last 24 hours - recommendations may be inaccurate Reduce or eliminate persistent connections to reduce connection usage Adjust your join queries to always utilize indexes Temporary table size is already large - reduce result set size Reduce your SELECT DISTINCT queries without LIMIT clauses Increase table_cache gradually to avoid file descriptor limits Variables to adjust: max_connections (> 200) wait_timeout (< 600) interactive_timeout (< 600) join_buffer_size (> 5.0M, or always use indexes with joins) table_cache (> 10000) innodb_buffer_pool_size (>= 39G) Mysql primer output -- MYSQL PERFORMANCE TUNING PRIMER -- - By: Matthew Montgomery - MySQL Version 5.5.24-9-log x86_64 Uptime = 0 days 8 hrs 20 min 50 sec Avg. qps = 478 Total Questions = 14369568 Threads Connected = 16 Warning: Server has not been running for at least 48hrs. It may not be safe to use these recommendations To find out more information on how each of these runtime variables effects performance visit: http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html Visit http://www.mysql.com/products/enterprise/advisors.html for info about MySQL's Enterprise Monitoring and Advisory Service SLOW QUERIES The slow query log is enabled. Current long_query_time = 1.000000 sec. You have 260626 out of 14369701 that take longer than 1.000000 sec. to complete Your long_query_time seems to be fine BINARY UPDATE LOG The binary update log is enabled Binlog sync is not enabled, you could loose binlog records during a server crash WORKER THREADS Current thread_cache_size = 50 Current threads_cached = 45 Current threads_per_sec = 0 Historic threads_per_sec = 0 Your thread_cache_size is fine MAX CONNECTIONS Current max_connections = 200 Current threads_connected = 11 Historic max_used_connections = 201 The number of used connections is 100% of the configured maximum. You should raise max_connections INNODB STATUS Current InnoDB index space = 214 M Current InnoDB data space = 39.40 G Current InnoDB buffer pool free = 0 % Current innodb_buffer_pool_size = 5.85 G Depending on how much space your innodb indexes take up it may be safe to increase this value to up to 2 / 3 of total system memory MEMORY USAGE Max Memory Ever Allocated : 23.46 G Configured Max Per-thread Buffers : 15.84 G Configured Max Global Buffers : 7.54 G Configured Max Memory Limit : 23.39 G Physical Memory : 31.40 G Max memory limit seem to be within acceptable norms KEY BUFFER Current MyISAM index space = 5.61 G Current key_buffer_size = 1.47 G Key cache miss rate is 1 : 5578 Key buffer free ratio = 77 % Your key_buffer_size seems to be fine QUERY CACHE Query cache is enabled Current query_cache_size = 200 M Current query_cache_used = 101 M Current query_cache_limit = 50 M Current Query cache Memory fill ratio = 50.59 % Current query_cache_min_res_unit = 4 K MySQL won't cache query results that are larger than query_cache_limit in size SORT OPERATIONS Current sort_buffer_size = 64 M Current read_rnd_buffer_size = 256 K Sort buffer seems to be fine JOINS Current join_buffer_size = 5.00 M You have had 106606 queries where a join could not use an index properly You have had 8 joins without keys that check for key usage after each row join_buffer_size >= 4 M This is not advised You should enable "log-queries-not-using-indexes" Then look for non indexed joins in the slow query log. OPEN FILES LIMIT Current open_files_limit = 20210 files The open_files_limit should typically be set to at least 2x-3x that of table_cache if you have heavy MyISAM usage. Your open_files_limit value seems to be fine TABLE CACHE Current table_open_cache = 10000 tables Current table_definition_cache = 2000 tables You have a total of 1910 tables You have 2151 open tables. The table_cache value seems to be fine TEMP TABLES Current max_heap_table_size = 2.92 G Current tmp_table_size = 2.92 G Of 366426 temp tables, 49% were created on disk Perhaps you should increase your tmp_table_size and/or max_heap_table_size to reduce the number of disk-based temporary tables Note! BLOB and TEXT columns are not allow in memory tables. If you are using these columns raising these values might not impact your ratio of on disk temp tables. TABLE SCANS Current read_buffer_size = 3 M Current table scan ratio = 2846 : 1 read_buffer_size seems to be fine TABLE LOCKING Current Lock Wait ratio = 1 : 185 You may benefit from selective use of InnoDB. If you have long running SELECT's against MyISAM tables and perform frequent updates consider setting 'low_priority_updates=1'

Read the article

SOA PARTNER COMMUNITY NEWSLETTER JULY 2012

- by mseika

SOA PARTNER COMMUNITY NEWSLETTER JULY 2012 Dear SOA partner community member To provide our community members the best of our knowledge, we want your feedback on our SOA Partner community. Thus we are organizing SOA Partner Community Survey 2012. We request you to participate in the survey and give your valuable feedback on various areas of marketing, sales and education. To continue our successful BPM Suite, Oracle is launching together with you Process Accelerators initiative. It’s your opportunity to co-develop and market predefined processes. Oracle Fusion Applications Design Patterns are a great tool to develop your SOA or BPM solution or process accelerators. To promote your SOA & BPM Specialization we continue to offer several benefits. This month we would like to highlight our Specialization Plaques - make sure you request one for your office! Our Fusion Middleware Summer Camps are booked out, if could not get a seat you can attend the SOA & BPM track @ Virtual Developer Day: Oracle Fusion Development Oracle demo systems offer´s two new demos: Business Driven Development based on BPM Suite & SOA Lifecycle Management. Jürgen KressOracle SOA & BPM Partner Adoption EMEA NEW CONTENT Community SurveyProcess Accelerators KitPlaques SOA & BPM SpecializedSOA & BPM at Virtual Developer Day News from our Partners & CommunityOverview of SOA Diagnostics in 11.1.1.6 Business driven development(BDD) demo now available! SOA Lifecycle Management Oracle Fusion applications design patterns Updated material by Oracle Connect and Network SOA Blogs SOA on Facebook SOA on LinkedIn SOA on Twitter Mix SOA Forum COMMUNITY SURVEY Like every year we would like to get your feedback in our SOA Partner Community Survey 2012. Make sure that You attend to further develop our community and support our planning! It is key for us to get your feedback to prepare for the next fiscal year. Back to top PROCESS ACCELERATORS KIT Oracle is very interested to co-develop and market with you, our partners, pre-defined processes for BPM Suite.I am very happy to announce a new program called “Oracle BPM Partner Solution Catalog”. This program will provide a one-stop shop for our customers looking for Oracle BPM partner solutions available in the market today.The Oracle BPM Solution Catalog will be hosted on our very popular Oracle Technology Network (OTN). To give you an idea of the scale of customer visibility, OTN today receives over 1Million hits per day from our business and developer community. We would like to invite you to list your Oracle BPM 11g solutions available today.In order to participate in this program, you need to do the following: Fill in the attached slide templates - #3 and #4 for each Oracle BPM 11g solution you would like to list on OTN.Please add links to whitepapers , videos, references to the specific solution in the template slide. We recommend that you create a landing page on your website for these linked artifacts and just point to the same from within the PowerPoint template. This will give you the flexibility to update the information as frequently as needed. If you have the particular solution in production or a reference available, please list them as well. Send the PowerPoint template slides (1 set of slides for each Oracle BPM solution) to [email protected]. In addition to having the opportunity to list your solutions on OTN for Oracle customers, you will have the chance to advertise your new wins/implementations/solutions in an Oracle Sponsored PM Webinar held every quarter. This program is targeted to go live by the end of summer 2012. At this point, we are targeting a soft launch in July end 2012 so send on your BPM solutions information as soon as possible. We would love to have your solution(s) listed in the “Oracle BPM Partner Solution Catalog” at the time of the launch. This will be a live repository so you can keep adding more solutions as they become available. If you have any questions, please feel free to contact us [email protected], Product Strategy Director, Oracle BPM , Phone +1 650.506.5486.Thank you and look forward to hearing from you. Oracle BPM team Process Accelerators Overview.pdf ProcessAcceleratorsDataSheet.pdf Demos draUPK.zip & trmUPK.zip BPM Solution repository slides.ppt Additional BPM material BPM Process Development Lifecycle Document that describes recommended approach to collaborative process modeling across business and IT tools ADF 11g PS5 Application with Customized BPM Worklist Task Flow (MDS Seeded Customization) by Andrejus Baranovskis BPMN process editor problems in 11.1.1.6 by Mark Nelson BPM – Disable DBMS job to refresh B2B Materialized View by Mark Nelson For the complete kit please visit the BPM folder at our SOA Community Workspace (SOA Community membership required). For the complete presentation please visit our SOA Community Workspace (SOA Community membership required). Information is Oracle and Partner confidential! Back to top PLAQUES SOA & BPM SPECIALIZED We continue to offer you a nice SOA & BPM Specialization plaque with your logo to proof your success. If you are a SOA or BPM Specialized partner and would like to request the plaque please send Brigitte an e-mail with the following information: Partner Name Partner logo (preferred eps file) Partner Status gold or platinum Your shipping address Your Specialization: SOA or BPM We recommend to mount the plaque at your office reception in addition you can use the SOA Specialization logos at your website download Logo: Gold & Platinum or the BPM logos Gold & Platinum Back to top SOA & BPM AT VIRTUAL DEVELOPER DAY Register now for this FREE hands-on online workshop Get up to date and learn everything you wanted to know about Oracle ADF & Fusion Development plus live Q&A chats with Oracle technical staffOracle Application Development Framework (ADF) is the standards based, strategic framework for Oracle Fusion Applications and Oracle Fusion Middleware. Oracle ADF’s integration with the Oracle SOA Suite, Oracle WebCenter and Oracle BI creates a complete productive development platform for your custom applications.Join us at this FREE virtual event and learn the latest in Fusion Development including: Is Oracle ADF development faster and simpler than Forms, Apex or .Net? Mobile Application Development with ADF Mobile Oracle ADF development with Eclipse Oracle WebCenter Portal and ADF Development Application Lifecycle Management with ADF Building Process Centric Applications with ADF and BPM Oracle Business Intelligence and ADF Integration Live Q&A chats with Oracle technical staff Developer lead, manager or architect - this event has something for everyone. Don’t miss this opportunity.Tuesday, July 10, 2012. 9:00 a.m. PT -1:00 p.m. PT 11:00 a.m. CT - 3:00 p.m. CT 12:00 p.m. ET - 4:00 p.m. ET 1:00 p.m. BRT - 5:00 p.m. BRT Register online now! for this FREE event. Agenda: 09:00 am Opening 09:30 am Keynote: Oracle Fusion Development Track1Introduction to Fusion Development Track2What's New in Fusion Development Track3Fusion Development in the Enterprise 10:00 am Is Oracle ADF Development Faster and Simpler than Oracle Forms, APEX or .Net? Mobile Application Development with ADF Mobile Oracle WebCenter Portal and ADF Development 11:00 am Rich Web UI made simple - an ADF Faces Overview Oracle Enterprise Pack for Eclipse - ADF Development Building Process Centric Applications with ADF and BPM 12:00 noon Next Generation Controller for JSF Application Lifecycle Management for ADF Oracle Business Intelligence and ADF Integration *Hands On Lab – WebCenter and ADF Lab w/ JDeveloper - Lab materials will be provided ahead of the event to give you ample time to work through the lab and increase the productivity of the live chat sessions the day of the event. Sessions abstractsRegister online now! for this FREE event Read more on Community Events and post your comment here. Back to top NEWS FROM OUR PARTNERS AND COMMUNITY Send your tweets @soacommunity #soacommunity and follow us at http://twitter.com/soacommunity JDeveloper & ADF?Troubleshooting BPMN process editor problems in 11.1.1.6http://dlvr.it/1p0FfS SOA Community?SOA & BPM @ Virtual Developer Day: Oracle Fusion Development - July 10th 2012https://soacommunity.wordpress.com/2012/07/02/soa-bpm-virtual-developer-day- oracle-fusion-developmentjuly-10th-2012/#soacommunity #soa #bom #education orclateamsoa ?A-Team Blog #ateam: BAM design pointers - In working recently with a large Oracle customer on SOA and BAM, I discove.http://ow.ly/1kYqES SOA CommunitySOA Community Newsletter June 2012http://wp.me/p10C8u-qw SOA CommunityBPMN process editor problems in 11.1.1.6 by Mark Nelsonhttp://redstack.wordpress.com/2012/06/27/ bpmn-process-editor-problems-in-11-1-1-6 #soacommunity #bpm OTNArchBeat ?SOA Learning Library: free short, topic-focused training on Oracle SOA & BPM products | @SOACommunity http://pub.vitrue.com/NE1G Andrejus Baranovskis ?ADF 11g PS5 Application with Customized BPM Worklist Task Flow (MDS Seeded Customization)http://fb.me/1coX4r1X1 SOA CommunitySOA Learning Library provides a comprehensive curriculum for the SOA and BPM product suites https://soacommunity.wordpress.com/2012/06/27/soa-learning-library #soacommunity #soa #bpm OTNArchBeat ?A Universal JMX Client for Weblogic - Part 1: Monitoring BPEL Thread Pools in SOA 11g | Stefan Koserhttp://pub.vitrue.com/mQVZ OTNArchBeat ?BPM - Disable DBMS job to refresh B2B Materialized View | Mark Nelson http://pub.vitrue.com/3PR0Oracle SOA ?Learn how Choice Hotels Implements Innovative Google Maps Solution with #OracleSOA http://bit.ly/MTwIJ3 SOA Communitytop Tweets SOA Partner Community - June 2012 Send your tweets @soacommunity #soacommunity https://soacommunity.wordpress.com/2012/06/25/top-tweets-soa-partner-community-june-2012 Torsten Winterberg#OPITZ is pushing Oracle commitment to the next level: New Specializations done: ADF, BPM, WLS, Exadatahttp://bit.ly/KX1WVS ServiceTechSymposium ?Only 8 more days left until Super Early Bird Registration Discount expires! http://www.servicetechsymposium.com OracleBlogsSOA Management in 3 minutes - Video explainerhttp://ow.ly/1kN5pn SOA Community ?SOA, Cloud & Service Technology Symposium 2012 London - Enter Promo Code: Djmxz370https://soacommunity.wordpress.com/2012/06/22/soa-cloud-service-technology-symposium-2012-london #soasymposium #soacommunity #soa Heidi BuelowGreat course! w David Read RT @soacommunity: product management ADF for BPM training 5 seats left https://soacommunity.wordpress.com/2012/06/12/fusion-middleware-summer-campsadvanced-partner-trainings/ #bpm #soacommunity SOA Community ?product management ADF for BPM training 5 seats lefthttps://soacommunity.wordpress.com/2012/06/12/fusion-middleware-summer-campsadvanced-partner-trainings/ #bpm #soacommunity OTNArchBeat ?Oacle Fusion Applications Design Patterns Now Available For Developers | Ultan O'Broinhttp://pub.vitrue.com/UEiF OTNArchBeat ?SOA, Cloud & Service Technology Symposium 2012London - Special Oracle Discounthttp://pub.vitrue.com/8E0J SOA CommunityBecome a facebook fan of soacommunity http://www.facebook.com/soacommunity #soacommunity SOA Community ?SOA Suite HealthCare Integration Architecture https://blogs.oracle.com/SOAForHealthcare/entry/soa_suite_healthcare_integration_architecture #soacommunity #soa Andrejus Baranovskis ?Running Pre-built Virtual Machine for SOA Suite and BPM Suite 11g PS5 on Mac OS X Snow Leopard (10.6http://fb.me/vB8nO0Vg OracleBlogsPrinciples of Service-Oriented Architecture by Douwe P. van den Bos http://ow.ly/1kIcOP OTNArchBeatOracle Public Cloud Architecture | @TylerJewell http://ow.ly/bHAcL The SOA Network ?Business Process Management, Service-Oriented Architecture, and Web 2.0: Business Transformation or.http://bit.ly/LBgREL #ITNews #SOA OracleBlogs ?Oracle SOA Foundation Practitioner Certificationhttp://ow.ly/1kGYYg Frank Nimphius ?Learn Advanced ADF. ORACLE Fusion Middleware Summer Camps in Lisbon - July 9th - 13thhttp://bit.ly/KGCl3i SOA CommunityTransform Your Application Integration with Best Practices from Oracle Customershttps://blogs.oracle.com/SOA/entry/transform_your_application_integration_with #soacommunity #soa #bpm Simone GeibWhat you always wanted to know about #oraclesoa diagnostics: Shawn Bailey, Overview of SOA Diagnostics in 11.1.1.6,http://ow.ly/bxK0M Oracle SOA ?Save the date: Jun 21 10AM, SOA & BPM Customer Insight Series. Hear how Choice Hotels went from legacy to #oraclesoa http://bit.ly/LsNDGl OTNArchBeat ?New VirtualBox images for Oracle SOA Suite & Oracle BPM Suite 11.1.1.6.0http://ow.ly/bwDAl OracleBlogs ?Process development lifecycle in Oracle BPM 11g http://ow.ly/1ktesY Daniel AmadeiNew post: Oracle BPEL 11g Message Delivery & Recovery.http://amadei.com.br/blog/index.php /oracle-bpel-11g-message-delivery SOA Community ?Sending out the June edition of the #soacommunity newsletter - read it or become a member http://www.oracle.com/goto/emea/soa!#soa #bpm Arun Pareek ?For the past six months Ahmed Aboulnaga and me have been working on Oracle SOA Suite 11g Administrator's Handbook.http://lnkd.in/CAvpUQ SOA CommunitySun shine all day no clouds - solar eclipse is over... #sunshine #cloud http://www.infoq.com/presentations/Swarm-Computing Michel SchildmeijerWatch my blog Oracle Service Bus 11g: listing projects and services with WLST - part 1 http://lnkd.in/B7f3GQ @TITAN_GS @wlscommunity OTNArchBeatBook Review: Oracle Application Integration Architecture (AIA) Foundation Pack 11gR1: Essentials | Rajesh Rahejahttp://ow.ly/bn2cc OTNArchBeat ?Driving from Business Architecture to Business Process Services | @vghariharan http://ow.ly/bn5UB OTNArchBeat ?SOA Analysis within the Department of Defense Architecture Framework (DoDAF) 2.0 - Part II | Dawit Lessanu http://ow.ly/bn6sX Simone Geib ?Contact me directly for ideas how to improvehttp://bit.ly/advancedsoasuite and additional posts, presentations, white papers, ... #soasuite Simone Geib ?#soasuite advanced OTN page has become too cluttered. Broke it into separate pages to start with. http://bit.ly/advancedsoasuite OracleBlogs ?June Webcast: SOA Gateway Implementation and Troubleshooting (2 sessions) http://ow.ly/1kbRFA ServiceTechSymposium ?New session just posted to calendar: "NoSQL for Data Services, Data Virtualization & Big Data" by Guido Schmutz, Trivadis AG ://ow.ly/bjjOeDebra Lilley ?looks good - real proof people are using the apps ! RT @fteter: Very cool Fusion Applications Help site: http://bit.ly/L3nvOR #FusionApps demed ?rapid proliferation of cloud computing will drive convergence of SOA and cloud paradigms" http://ovum.com/2012/05/18/soa-paves-the-way-for-cloud/ SOA CommunityMiddleware Oracle Excellence Awards 2012-HAPPY NEW YEAR! https://soacommunity.wordpress.com/2012/05/31/middleware-oracle-excellence-awards-2012happy-new-year/ #soacommunity #opn #opnaward #specialization #oracle SOA CommunityHappy New Year #soacommunity thanks for the business! Time for a drink http://pic.twitter.com/zkK08KWB OTNArchBeat ?Who should ‘own’ the Enterprise Architecture? | Michael Glas http://bit.ly/K0ge0Q SOA Communitytop Tweets SOA Partner Community – May 2012 http://wp.me/p10C8u-pP ServiceTechSymposiumNew session just posted to Symposium calendar: "Elastic SOA in the Cloud" by Steve Millidge, C2B2 Consulting http://www.servicetechsymposium.com/agenda2012.php #elastic_soa_in_the_cloud orclateamsoa ?A-Team Blog #ateam: How to Set JVM Parameters in Oracle SOA 11Ghttp://ow.ly/1k2cnl ServiceTechSymposium ?New session just posted to Symposium calendar: "SOA Governance at EDP: A Global Energy Company" by Manuel Rosa, Linkhttp://www.servicetechsymposium.com/agenda2012.php#soa_governance_at_edp SOA Community ?VirtualBox image SOA Suite & BPM Suite 11.1.1.6.0–Your feedback?http://wp.me/p10C8u-qh Oracle MiddlewareSave the date: Jun 21 10AM, SOA & BPM Customer Insight Series. Hear how Choice Hotels went from legacy to#oraclesoa http://bit.ly/LU1y5N OTNArchBeat ?Goodbye, Silos. Hello SOA. | @stephanieoverbyhttp://pub.vitrue.com/NJJO SOA CommunityBPM Standard Edition - to start your BPM project http://wp.me/p10C8u-qj Please feel free to send us your news! And add your blog to our SOA blog wiki. Back to top OVERVIEW OF SOA DIAGNOSTICS IN 11.1.1.6 What tools are available for diagnosing SOA Suite issues? There are a variety of tools available to help you and Support diagnose SOA Suite issues in 11g but it can be confusing as to which tool is appropriate for a particular situation and what their relationships are. This blog post will introduce the various tools and attempt to clarify what each is for and how they are related. Let's first list the tools we'll be addressing: RDA: Remote Diagnostic Agent DFW: Diagnostic Framework Selective Tracing DMS: Dynamic Monitoring Service ODL: Oracle Diagnostic Logging ADR: Automatic Diagnostics Repository ADRCI: Automatic Diagnostics Repository Command Interpreter WLDF: WebLogic Diagnostic Framework This overview is not mean to be a comprehensive guide on using all of these tools, however, extensive reference materials are included that will provide many more details on their execution. Another point to note is that all of these tools are applicable for Fusion Middleware as a whole but specific products may or may not have implemented features to leverage them. A couple of the tools have a WebLogic Scripting Tool or 'WLST' interface. WLST is a command interface for executing pre-built functions and custom scripts against a domain. A detailed WLST tutorial is beyond the scope of this post but you can find general information here. There are more specific resources in the below sections.In this post when we refer to 'Enterprise Manager' or 'EM' we are referring to Enterprise Manager Fusion Middleware Control. read the full blog post here. Read more on Oracle and post your comment here. Back to top BUSINESS DRIVEN DEVELOPMENT (BDD) DEMO NOW AVAILABLE! For access to the Oracle demo systems please visit OPN and talk to your Partner Expert DSS is pleased to announce the availability of the demo “Business Driven Development“. This innovative demonstration uses a case-study approach to show business users how they can easily streamline their Business Processes - delivering greater efficiency, agility, visibility and collaboration with Oracle BPM and WebCenter. The BDD demonstration uses a case study-based approach to highlight a business problem at a fictional company, Avitek Corporation, and uses Oracle BPM and Oracle WebCenter to solve the business problem. This holistic approach has specifically been used to appeal to a non-technical business analyst user. This demo is NOT focused on product features, but aims to guide users through a complete BPM lifecycle. The scenario is based on improving a simple order process (scenario details are in the demo script). Avitek Corporation is sufferinng from a manual email-driven ordering process. Sales reps don’t know where the customer orders are stuck (no visibility) and finance users are unable to manually approve every order (no automation). There are several areas where this process can be improved with Business Process Management technology. This demo shows how improving following areas will ignificantly help resolve the business problems Avitek Corporation is facing. Areas for improvement include: Utilizing BPM for process management, rather than an unregulated, email-based process. Utilizing automated services, rather than requiring a human to key into a system. For example, Finance checking the customer’s credit rating is something that could be automated. Centralizing business rules that can be integrated into a business process, rather than requiring a human to process them. For example, Finance must determine when orders can be automatically approved. Provide insight and visibility into the process. For example, Sales Rep needs to know the status of their customer’s orders. The BDD Demo uses the following products. Oracle BPM Suite 11g PS4FP Oracle WebCenter 11g PS4FP (for Process Spaces) Oracle Business Activity Monitoring 11g Oracle Database 11g Back to top SOA LIFECYCLE MANAGEMENT For access to the Oracle demo systems please visit OPN and talk to your Partner Expert We are pleased to announce the availability of the SOA Management demo that showcases some of the key provisioning and lifecycle management capabilities of SOA Management Pack Enterprise Edition (EE). This demo specifically focuses on some of the lifecycle management solutions for Oracle SOA Suite and Oracle Service Bus (OSB). Demo Highlights The demo showcases the following capabilities. Provisioning of SOA Composites Provisioning of OSB Projects Provision SOA and OSB artifacts in a future maintenance window Back to top ORACLE FUSION APPLICATIONS DESIGN PATTERNS The Oracle Fusion Applications user experience design patterns are published! These new, reusable usability solutions and best-practices, which will join the Oracle dashboard patterns and guidelines that are already available online, are used by Oracle to artfully bring to life a new standard in the user experience, or UX, of enterprise applications. Now, the Oracle applications development community can benefit from the science behind the Oracle Fusion Applications user experience, too. These Oracle Fusion Applications UX Design Patterns, or blueprints, enable Oracle applications developers and system implementers everywhere to leverage professional usability insight when: tailoring an Oracle Fusion application, creating coexistence solutions that existing users will be delighted with, thus enabling graceful user transitions to Oracle Fusion Applications down the road, or designing exciting, new, highly usable applications in the cloud or on-premise. Based on the Oracle Application Development Framework (ADF) components, the Oracle Fusion Applications patterns and guidelines are proven with real users and in the Applications UX usability labs, so you can get right to work coding productivity-enhancing designs that provide an advantage for your entire business. What’s the best way to get started? We’ve made that easy, too. The Design Filter Tool (DeFT) selects the best pattern for your user type and task. Simply adapt your selection for your own task flow and content, and you’re on your way to a really great applications user experience. More Oracle applications design patterns and training are coming your way in the future. To provide feedback on the sets that are currently available, let me know in the comments! Read more on Fusionapps and post your comment here. Back to top UPDATED ORACLE MATERIAL Integrated SOA Gateway Documentation - Implementation Guide | Developer’s Guide Webcast Series: Oracle’s SOA and Oracle Business Process Management Solutions (Choice Hotels, Eaton, Farmers Insurance) BAM design pointers By Kavitha Srinivasan Seeking Oracle Fusion Middleware Go Live StoriesOracle Fusion Middleware product management is looking for recent go live stories to share with the Oracle sales team, sales consulting, product management and other internal groups. Customer contact details may remain anonymous. Your successful implementation will be featured in a quarterly report. The chance to present on an internal webcast is also available. Contact Maria Forney ([email protected]) if you have a noteworthy implementation success story. This is a good opportunity for partners interested in showcasing Oracle Fusion Middleware implementations, and gaining more exposure within Oracle. Performance tuning resources. All in one: docs, blogs, WPs, ppts: http://bit.ly/soa_resources Back to top HAVE YOU MISSED OUR LAST SOA PARTNER COMMUNITY WEBCASTS? UPK Webcast Business Driven Application Management & BPM11g & Application Grid & GoldenGate & Fusion Middleware Pricing & OC4J to WebLogic & Next Generation SOA & Fusion Middleware in Utility & Fusion Middleware in Communications & Fusion Middleware in Public Services & Fusion Middleware in Financial Services Please check your local OPN trainings calendar for additional training dates and locations. Back to top SOA PARTNER COMMUNITY CALENDAR On-Demand Trainings Event Name Language Type SOA Virtual Developers Day English Tech In-Class Trainings Date Event name Location / Country Contact person Type 09-13.07.2012 BPM Suite 11g advanced training by David Read Lisbon, Portugal Jürgen Kress Tech 09-13.07.2012 ADF 11g advanced training by Grant Ronald and Frank Nimphius Lisbon, Portugal Jürgen Kress Tech 09-13.07.2012 WebCenter Portal advanced training by Stefan Krantz and Angelo Santagata Lisbon, Portugal Jürgen Kress Tech 10.07.2012 Fusion Middleware Virtual Developer Day Online OTN Tech 10- 12.07.2012 WebLogic 12c training by Cosmin Tudor Lisbon, Portugal Jürgen Kress Tech 16-18.07.2012 SOA Suite 11g advanced training by Niall Commiskey Munich, Germany Jürgen Kress Tech 16-18.07.2012 ADF for BPM Suite 11g advanced training by David Read Munich, Germany Jürgen Kress Tech 16-18.07.2012 WebCenter Sites 11g advanced training by Product Management Munich, Germany Jürgen Kress Tech 17-20.07.2012 Oracle BPM 11g Implementation Bootcamp Live Virtual Class Oracle University Tech 23-26.07.2012 Oracle BPM 11g Implementation Bootcamp Utrecht, Netherlands Oracle University Tech 29-31.08.2012 Oracle BPM 11g Implementation Bootcamp Live Virtual Class Oracle University Tech 02-05.10.2012 Oracle BPM 11g Implementation Bootcamp Utrecht, Netherlands Oracle University Tech 15-18.10.2012 Oracle BPM 11g Implementation Bootcamp Utrecht, Netherlands Oracle University Tech 28-30.11.2012 Oracle AIA 11g Implementation Bootcamp Live Virtual Class Oracle University Tech 11-14.12.2012 Oracle BPM 11g Implementation Bootcamp Live Virtual Class Oracle University Tech 20-22.2.2013 Oracle AIA 11g Implementation Bootcamp Utrecht, Netherlands Oracle University Tech 14-17.1.2013 Oracle BPM 11g Implementation Bootcamp Utrecht, Netherlands Oracle University Tech 15-18.3.2013 Oracle BPM 11g Implementation Bootcamp Utrecht, Netherlands Oracle University Tech Please check your local OPN Training Calendar for additional training and locations here. Back to top SOASCHOOL.COM - SOA CERTIFIED PROFESSIONAL(SOACP) PROGRAM The SOASchool.com - SOA Certified Professional (SOACP) program is dedicated to excellence in the field of SOA and service-oriented computing. Through a series of seasoned course modules and exams, IT professionals have the opportunity to obtain a number of different certifications to recognize their accomplishment of gaining "project ready" SOA proficiency. This comprehensive and strictly vendor-neutral program was developed in cooperation with best-selling SOA author Thomas Erl and several major SOA organizations and academic institutions. Through the involvement of the SOA Education Committee, course contents and certification requirements are constantly reviewed and revised to stay current with developments in the service-oriented computing industry. The program is currently comprised of 12 course modules and 5 certifications and is expanding to 18 course modules and 8 certifications throughout 2009. For more information, visit www.soaschool.com and www.soacp.com. Blog Twitter LinkedIn Mix Forum Wiki Back to top YOUR CONTENT ON THE NEWSLETTER AND ON THE SOA COMMUNITY PORTAL Publishing Your StoriesWe would like to invite our partners to publish information in the newsletter or on our SOA Community portal. Especially we are looking for your real life experience with our SOA technology. Please send your documents to Jürgen Kress. We look forward to getting your suggestions! Back to top SOA DISCUSSION FORUM BECOMES INTERACTIVE AT THE SOA COMMUNITY! Do you want to chat to experts, including partners and Oracle SOA Product Development? Do you want to get the latest information about our SOA solutions and events?Attend our private online SOA Discussion Forum at OTN. Please send your OTN forums user name to Brigitte Felisaz. You must be a registered user to access the SOA Discussion Forum. Back to top INVITE YOUR COLLEAGUES TO JOIN THE SOA COMMUNITY Please feel free to invite your colleagues to join the SOA Community and to participate in the SOA Assessment tests. For registration please login the Oracle PartnerNetwork and go to: www.oracle.com/goto/emea/soa For any questions on the above or concerning SOA and Oracle in general please contact the Oracle EMEA Alliances & Channels SOA Team. Best regardsOracle EMEA SOA TeamJürgen Kress Jürgen KressSOA Partner Adoption EMEATel. +49 89 1430 1479E-Mail: [email protected]

Read the article

Upload File to Windows Azure Blob in Chunks through ASP.NET MVC, JavaScript and HTML5

- by Shaun

Originally posted on: http://geekswithblogs.net/shaunxu/archive/2013/07/01/upload-file-to-windows-azure-blob-in-chunks-through-asp.net.aspxMany people are using Windows Azure Blob Storage to store their data in the cloud. Blob storage provides 99.9% availability with easy-to-use API through .NET SDK and HTTP REST. For example, we can store JavaScript files, images, documents in blob storage when we are building an ASP.NET web application on a Web Role in Windows Azure. Or we can store our VHD files in blob and mount it as a hard drive in our cloud service. If you are familiar with Windows Azure, you should know that there are two kinds of blob: page blob and block blob. The page blob is optimized for random read and write, which is very useful when you need to store VHD files. The block blob is optimized for sequential/chunk read and write, which has more common usage. Since we can upload block blob in blocks through BlockBlob.PutBlock, and them commit them as a whole blob with invoking the BlockBlob.PutBlockList, it is very powerful to upload large files, as we can upload blocks in parallel, and provide pause-resume feature. There are many documents, articles and blog posts described on how to upload a block blob. Most of them are focus on the server side, which means when you had received a big file, stream or binaries, how to upload them into blob storage in blocks through .NET SDK. But the problem is, how can we upload these large files from client side, for example, a browser. This questioned to me when I was working with a Chinese customer to help them build a network disk production on top of azure. The end users upload their files from the web portal, and then the files will be stored in blob storage from the Web Role. My goal is to find the best way to transform the file from client (end user’s machine) to the server (Web Role) through browser. In this post I will demonstrate and describe what I had done, to upload large file in chunks with high speed, and save them as blocks into Windows Azure Blob Storage. Traditional Upload, Works with Limitation The simplest way to implement this requirement is to create a web page with a form that contains a file input element and a submit button. 1: @using (Html.BeginForm("About", "Index", FormMethod.Post, new { enctype = "multipart/form-data" })) 2: { 3: <input type="file" name="file" /> 4: <input type="submit" value="upload" /> 5: } And then in the backend controller, we retrieve the whole content of this file and upload it in to the blob storage through .NET SDK. We can split the file in blocks and upload them in parallel and commit. The code had been well blogged in the community. 1: [HttpPost] 2: public ActionResult About(HttpPostedFileBase file) 3: { 4: var container = _client.GetContainerReference("test"); 5: container.CreateIfNotExists(); 6: var blob = container.GetBlockBlobReference(file.FileName); 7: var blockDataList = new Dictionary<string, byte[]>(); 8: using (var stream = file.InputStream) 9: { 10: var blockSizeInKB = 1024; 11: var offset = 0; 12: var index = 0; 13: while (offset < stream.Length) 14: { 15: var readLength = Math.Min(1024 * blockSizeInKB, (int)stream.Length - offset); 16: var blockData = new byte[readLength]; 17: offset += stream.Read(blockData, 0, readLength); 18: blockDataList.Add(Convert.ToBase64String(BitConverter.GetBytes(index)), blockData); 19: 20: index++; 21: } 22: } 23: 24: Parallel.ForEach(blockDataList, (bi) => 25: { 26: blob.PutBlock(bi.Key, new MemoryStream(bi.Value), null); 27: }); 28: blob.PutBlockList(blockDataList.Select(b => b.Key).ToArray()); 29: 30: return RedirectToAction("About"); 31: } This works perfect if we selected an image, a music or a small video to upload. But if I selected a large file, let’s say a 6GB HD-movie, after upload for about few minutes the page will be shown as below and the upload will be terminated. In ASP.NET there is a limitation of request length and the maximized request length is defined in the web.config file. It’s a number which less than about 4GB. So if we want to upload a really big file, we cannot simply implement in this way. Also, in Windows Azure, a cloud service network load balancer will terminate the connection if exceed the timeout period. From my test the timeout looks like 2 - 3 minutes. Hence, when we need to upload a large file we cannot just use the basic HTML elements. Besides the limitation mentioned above, the simple HTML file upload cannot provide rich upload experience such as chunk upload, pause and pause-resume. So we need to find a better way to upload large file from the client to the server. Upload in Chunks through HTML5 and JavaScript In order to break those limitation mentioned above we will try to upload the large file in chunks. This takes some benefit to us such as - No request size limitation: Since we upload in chunks, we can define the request size for each chunks regardless how big the entire file is. - No timeout problem: The size of chunks are controlled by us, which means we should be able to make sure request for each chunk upload will not exceed the timeout period of both ASP.NET and Windows Azure load balancer. It was a big challenge to upload big file in chunks until we have HTML5. There are some new features and improvements introduced in HTML5 and we will use them to implement our solution. In HTML5, the File interface had been improved with a new method called “slice”. It can be used to read part of the file by specifying the start byte index and the end byte index. For example if the entire file was 1024 bytes, file.slice(512, 768) will read the part of this file from the 512nd byte to 768th byte, and return a new object of interface called "Blob”, which you can treat as an array of bytes. In fact, a Blob object represents a file-like object of immutable, raw data. The File interface is based on Blob, inheriting blob functionality and expanding it to support files on the user's system. For more information about the Blob please refer here. File and Blob is very useful to implement the chunk upload. We will use File interface to represent the file the user selected from the browser and then use File.slice to read the file in chunks in the size we wanted. For example, if we wanted to upload a 10MB file with 512KB chunks, then we can read it in 512KB blobs by using File.slice in a loop. Assuming we have a web page as below. User can select a file, an input box to specify the block size in KB and a button to start upload. 1: <div> 2: <input type="file" id="upload_files" name="files[]" /><br /> 3: Block Size: <input type="number" id="block_size" value="512" name="block_size" />KB<br /> 4: <input type="button" id="upload_button_blob" name="upload" value="upload (blob)" /> 5: </div> Then we can have the JavaScript function to upload the file in chunks when user clicked the button. 1: <script type="text/javascript"> 1: 2: $(function () { 3: $("#upload_button_blob").click(function () { 4: }); 5: });</script> Firstly we need to ensure the client browser supports the interfaces we are going to use. Just try to invoke the File, Blob and FormData from the “window” object. If any of them is “undefined” the condition result will be “false” which means your browser doesn’t support these premium feature and it’s time for you to get your browser updated. FormData is another new feature we are going to use in the future. It could generate a temporary form for us. We will use this interface to create a form with chunk and associated metadata when invoked the service through ajax. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: if (window.File && window.Blob && window.FormData) { 4: alert("Your brwoser is awesome, let's rock!"); 5: } 6: else { 7: alert("Oh man plz update to a modern browser before try is cool stuff out."); 8: return; 9: } 10: }); Each browser supports these interfaces by their own implementation and currently the Blob, File and File.slice are supported by Chrome 21, FireFox 13, IE 10, Opera 12 and Safari 5.1 or higher. After that we worked on the files the user selected one by one since in HTML5, user can select multiple files in one file input box. 1: var files = $("#upload_files")[0].files; 2: for (var i = 0; i < files.length; i++) { 3: var file = files[i]; 4: var fileSize = file.size; 5: var fileName = file.name; 6: } Next, we calculated the start index and end index for each chunks based on the size the user specified from the browser. We put them into an array with the file name and the index, which will be used when we upload chunks into Windows Azure Blob Storage as blocks since we need to specify the target blob name and the block index. At the same time we will store the list of all indexes into another variant which will be used to commit blocks into blob in Azure Storage once all chunks had been uploaded successfully. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: var blockSizeInKB = $("#block_size").val(); 14: var blockSize = blockSizeInKB * 1024; 15: var blocks = []; 16: var offset = 0; 17: var index = 0; 18: var list = ""; 19: while (offset < fileSize) { 20: var start = offset; 21: var end = Math.min(offset + blockSize, fileSize); 22: 23: blocks.push({ 24: name: fileName, 25: index: index, 26: start: start, 27: end: end 28: }); 29: list += index + ","; 30: 31: offset = end; 32: index++; 33: } 34: } 35: }); Now we have all chunks’ information ready. The next step should be upload them one by one to the server side, and at the server side when received a chunk it will upload as a block into Blob Storage, and finally commit them with the index list through BlockBlobClient.PutBlockList. But since all these invokes are ajax calling, which means not synchronized call. So we need to introduce a new JavaScript library to help us coordinate the asynchronize operation, which named “async.js”. You can download this JavaScript library here, and you can find the document here. I will not explain this library too much in this post. We will put all procedures we want to execute as a function array, and pass into the proper function defined in async.js to let it help us to control the execution sequence, in series or in parallel. Hence we will define an array and put the function for chunk upload into this array. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: 5: // start to upload each files in chunks 6: var files = $("#upload_files")[0].files; 7: for (var i = 0; i < files.length; i++) { 8: var file = files[i]; 9: var fileSize = file.size; 10: var fileName = file.name; 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: ... ... 14: 15: // define the function array and push all chunk upload operation into this array 16: blocks.forEach(function (block) { 17: putBlocks.push(function (callback) { 18: }); 19: }); 20: } 21: }); 22: }); As you can see, I used File.slice method to read each chunks based on the start and end byte index we calculated previously, and constructed a temporary HTML form with the file name, chunk index and chunk data through another new feature in HTML5 named FormData. Then post this form to the backend server through jQuery.ajax. This is the key part of our solution. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: blocks.forEach(function (block) { 15: putBlocks.push(function (callback) { 16: // load blob based on the start and end index for each chunks 17: var blob = file.slice(block.start, block.end); 18: // put the file name, index and blob into a temporary from 19: var fd = new FormData(); 20: fd.append("name", block.name); 21: fd.append("index", block.index); 22: fd.append("file", blob); 23: // post the form to backend service (asp.net mvc controller action) 24: $.ajax({ 25: url: "/Home/UploadInFormData", 26: data: fd, 27: processData: false, 28: contentType: "multipart/form-data", 29: type: "POST", 30: success: function (result) { 31: if (!result.success) { 32: alert(result.error); 33: } 34: callback(null, block.index); 35: } 36: }); 37: }); 38: }); 39: } 40: }); Then we will invoke these functions one by one by using the async.js. And once all functions had been executed successfully I invoked another ajax call to the backend service to commit all these chunks (blocks) as the blob in Windows Azure Storage. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.series(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); That’s all in the client side. The outline of our logic would be - Calculate the start and end byte index for each chunks based on the block size. - Defined the functions of reading the chunk form file and upload the content to the backend service through ajax. - Execute the functions defined in previous step with “async.js”. - Commit the chunks by invoking the backend service in Windows Azure Storage finally. Save Chunks as Blocks into Blob Storage In above we finished the client size JavaScript code. It uploaded the file in chunks to the backend service which we are going to implement in this step. We will use ASP.NET MVC as our backend service, and it will receive the chunks, upload into Windows Azure Bob Storage in blocks, then finally commit as one blob. As in the client side we uploaded chunks by invoking the ajax call to the URL "/Home/UploadInFormData", I created a new action under the Index controller and it only accepts HTTP POST request. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: } 8: catch (Exception e) 9: { 10: error = e.ToString(); 11: } 12: 13: return new JsonResult() 14: { 15: Data = new 16: { 17: success = string.IsNullOrWhiteSpace(error), 18: error = error 19: } 20: }; 21: } Then I retrieved the file name, index and the chunk content from the Request.Form object, which was passed from our client side. And then, used the Windows Azure SDK to create a blob container (in this case we will use the container named “test”.) and create a blob reference with the blob name (same as the file name). Then uploaded the chunk as a block of this blob with the index, since in Blob Storage each block must have an index (ID) associated with so that finally we can put all blocks as one blob by specifying their block ID list. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var index = int.Parse(Request.Form["index"]); 9: var file = Request.Files[0]; 10: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 11: 12: var container = _client.GetContainerReference("test"); 13: container.CreateIfNotExists(); 14: var blob = container.GetBlockBlobReference(name); 15: blob.PutBlock(id, file.InputStream, null); 16: } 17: catch (Exception e) 18: { 19: error = e.ToString(); 20: } 21: 22: return new JsonResult() 23: { 24: Data = new 25: { 26: success = string.IsNullOrWhiteSpace(error), 27: error = error 28: } 29: }; 30: } Next, I created another action to commit the blocks into blob once all chunks had been uploaded. Similarly, I retrieved the blob name from the Request.Form. I also retrieved the chunks ID list, which is the block ID list from the Request.Form in a string format, split them as a list, then invoked the BlockBlob.PutBlockList method. After that our blob will be shown in the container and ready to be download. 1: [HttpPost] 2: public JsonResult Commit() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var list = Request.Form["list"]; 9: var ids = list 10: .Split(',') 11: .Where(id => !string.IsNullOrWhiteSpace(id)) 12: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 13: .ToArray(); 14: 15: var container = _client.GetContainerReference("test"); 16: container.CreateIfNotExists(); 17: var blob = container.GetBlockBlobReference(name); 18: blob.PutBlockList(ids); 19: } 20: catch (Exception e) 21: { 22: error = e.ToString(); 23: } 24: 25: return new JsonResult() 26: { 27: Data = new 28: { 29: success = string.IsNullOrWhiteSpace(error), 30: error = error 31: } 32: }; 33: } Now we finished all code we need. The whole process of uploading would be like this below. Below is the full client side JavaScript code. 1: <script type="text/javascript" src="~/Scripts/async.js"></script> 2: <script type="text/javascript"> 3: $(function () { 4: $("#upload_button_blob").click(function () { 5: // assert the browser support html5 6: if (window.File && window.Blob && window.FormData) { 7: alert("Your brwoser is awesome, let's rock!"); 8: } 9: else { 10: alert("Oh man plz update to a modern browser before try is cool stuff out."); 11: return; 12: } 13: 14: // start to upload each files in chunks 15: var files = $("#upload_files")[0].files; 16: for (var i = 0; i < files.length; i++) { 17: var file = files[i]; 18: var fileSize = file.size; 19: var fileName = file.name; 20: 21: // calculate the start and end byte index for each blocks(chunks) 22: // with the index, file name and index list for future using 23: var blockSizeInKB = $("#block_size").val(); 24: var blockSize = blockSizeInKB * 1024; 25: var blocks = []; 26: var offset = 0; 27: var index = 0; 28: var list = ""; 29: while (offset < fileSize) { 30: var start = offset; 31: var end = Math.min(offset + blockSize, fileSize); 32: 33: blocks.push({ 34: name: fileName, 35: index: index, 36: start: start, 37: end: end 38: }); 39: list += index + ","; 40: 41: offset = end; 42: index++; 43: } 44: 45: // define the function array and push all chunk upload operation into this array 46: var putBlocks = []; 47: blocks.forEach(function (block) { 48: putBlocks.push(function (callback) { 49: // load blob based on the start and end index for each chunks 50: var blob = file.slice(block.start, block.end); 51: // put the file name, index and blob into a temporary from 52: var fd = new FormData(); 53: fd.append("name", block.name); 54: fd.append("index", block.index); 55: fd.append("file", blob); 56: // post the form to backend service (asp.net mvc controller action) 57: $.ajax({ 58: url: "/Home/UploadInFormData", 59: data: fd, 60: processData: false, 61: contentType: "multipart/form-data", 62: type: "POST", 63: success: function (result) { 64: if (!result.success) { 65: alert(result.error); 66: } 67: callback(null, block.index); 68: } 69: }); 70: }); 71: }); 72: 73: // invoke the functions one by one 74: // then invoke the commit ajax call to put blocks into blob in azure storage 75: async.series(putBlocks, function (error, result) { 76: var data = { 77: name: fileName, 78: list: list 79: }; 80: $.post("/Home/Commit", data, function (result) { 81: if (!result.success) { 82: alert(result.error); 83: } 84: else { 85: alert("done!"); 86: } 87: }); 88: }); 89: } 90: }); 91: }); 92: </script> And below is the full ASP.NET MVC controller code. 1: public class HomeController : Controller 2: { 3: private CloudStorageAccount _account; 4: private CloudBlobClient _client; 5: 6: public HomeController() 7: : base() 8: { 9: _account = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("DataConnectionString")); 10: _client = _account.CreateCloudBlobClient(); 11: } 12: 13: public ActionResult Index() 14: { 15: ViewBag.Message = "Modify this template to jump-start your ASP.NET MVC application."; 16: 17: return View(); 18: } 19: 20: [HttpPost] 21: public JsonResult UploadInFormData() 22: { 23: var error = string.Empty; 24: try 25: { 26: var name = Request.Form["name"]; 27: var index = int.Parse(Request.Form["index"]); 28: var file = Request.Files[0]; 29: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 30: 31: var container = _client.GetContainerReference("test"); 32: container.CreateIfNotExists(); 33: var blob = container.GetBlockBlobReference(name); 34: blob.PutBlock(id, file.InputStream, null); 35: } 36: catch (Exception e) 37: { 38: error = e.ToString(); 39: } 40: 41: return new JsonResult() 42: { 43: Data = new 44: { 45: success = string.IsNullOrWhiteSpace(error), 46: error = error 47: } 48: }; 49: } 50: 51: [HttpPost] 52: public JsonResult Commit() 53: { 54: var error = string.Empty; 55: try 56: { 57: var name = Request.Form["name"]; 58: var list = Request.Form["list"]; 59: var ids = list 60: .Split(',') 61: .Where(id => !string.IsNullOrWhiteSpace(id)) 62: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 63: .ToArray(); 64: 65: var container = _client.GetContainerReference("test"); 66: container.CreateIfNotExists(); 67: var blob = container.GetBlockBlobReference(name); 68: blob.PutBlockList(ids); 69: } 70: catch (Exception e) 71: { 72: error = e.ToString(); 73: } 74: 75: return new JsonResult() 76: { 77: Data = new 78: { 79: success = string.IsNullOrWhiteSpace(error), 80: error = error 81: } 82: }; 83: } 84: } And if we selected a file from the browser we will see our application will upload chunks in the size we specified to the server through ajax call in background, and then commit all chunks in one blob. Then we can find the blob in our Windows Azure Blob Storage. Optimized by Parallel Upload In previous example we just uploaded our file in chunks. This solved the problem that ASP.NET MVC request content size limitation as well as the Windows Azure load balancer timeout. But it might introduce the performance problem since we uploaded chunks in sequence. In order to improve the upload performance we could modify our client side code a bit to make the upload operation invoked in parallel. The good news is that, “async.js” library provides the parallel execution function. If you remembered the code we invoke the service to upload chunks, it utilized “async.series” which means all functions will be executed in sequence. Now we will change this code to “async.parallel”. This will invoke all functions in parallel. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallel(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); In this way all chunks will be uploaded to the server side at the same time to maximize the bandwidth usage. This should work if the file was not very large and the chunk size was not very small. But for large file this might introduce another problem that too many ajax calls are sent to the server at the same time. So the best solution should be, upload the chunks in parallel with maximum concurrency limitation. The code below specified the concurrency limitation to 4, which means at the most only 4 ajax calls could be invoked at the same time. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallelLimit(putBlocks, 4, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); Summary In this post we discussed how to upload files in chunks to the backend service and then upload them into Windows Azure Blob Storage in blocks. We focused on the frontend side and leverage three new feature introduced in HTML 5 which are - File.slice: Read part of the file by specifying the start and end byte index. - Blob: File-like interface which contains the part of the file content. - FormData: Temporary form element that we can pass the chunk alone with some metadata to the backend service. Then we discussed the performance consideration of chunk uploading. Sequence upload cannot provide maximized upload speed, but the unlimited parallel upload might crash the browser and server if too many chunks. So we finally came up with the solution to upload chunks in parallel with the concurrency limitation. We also demonstrated how to utilize “async.js” JavaScript library to help us control the asynchronize call and the parallel limitation. Regarding the chunk size and the parallel limitation value there is no “best” value. You need to test vary composition and find out the best one for your particular scenario. It depends on the local bandwidth, client machine cores and the server side (Windows Azure Cloud Service Virtual Machine) cores, memory and bandwidth. Below is one of my performance test result. The client machine was Windows 8 IE 10 with 4 cores. I was using Microsoft Cooperation Network. The web site was hosted on Windows Azure China North data center (in Beijing) with one small web role (1.7GB 1 core CPU, 1.75GB memory with 100Mbps bandwidth). The test cases were - Chunk size: 512KB, 1MB, 2MB, 4MB. - Upload Mode: Sequence, parallel (unlimited), parallel with limit (4 threads, 8 threads). - Chunk Format: base64 string, binaries. - Target file: 100MB. - Each case was tested 3 times. Below is the test result chart. Some thoughts, but not guidance or best practice: - Parallel gets better performance than series. - No significant performance improvement between parallel 4 threads and 8 threads. - Transform with binaries provides better performance than base64. - In all cases, chunk size in 1MB - 2MB gets better performance. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article

Web Browser Control – Specifying the IE Version

- by Rick Strahl

I use the Internet Explorer Web Browser Control in a lot of my applications to display document type layout. HTML happens to be one of the most common document formats and displaying data in this format – even in desktop applications, is often way easier than using normal desktop technologies. One issue the Web Browser Control has that it’s perpetually stuck in IE 7 rendering mode by default. Even though IE 8 and now 9 have significantly upgraded the IE rendering engine to be more CSS and HTML compliant by default the Web Browser control will have none of it. IE 9 in particular – with its much improved CSS support and basic HTML 5 support is a big improvement and even though the IE control uses some of IE’s internal rendering technology it’s still stuck in the old IE 7 rendering by default. This applies whether you’re using the Web Browser control in a WPF application, a WinForms app, a FoxPro or VB classic application using the ActiveX control. Behind the scenes all these UI platforms use the COM interfaces and so you’re stuck by those same rules. Rendering Challenged To see what I’m talking about here are two screen shots rendering an HTML 5 doctype page that includes some CSS 3 functionality – rounded corners and border shadows - from an earlier post. One uses IE 9 as a standalone browser, and one uses a simple WPF form that includes the Web Browser control. IE 9 Browser: Web Browser control in a WPF form: The IE 9 page displays this HTML correctly – you see the rounded corners and shadow displayed. Obviously the latter rendering using the Web Browser control in a WPF application is a bit lacking. Not only are the new CSS features missing but the page also renders in Internet Explorer’s quirks mode so all the margins, padding etc. behave differently by default, even though there’s a CSS reset applied on this page. If you’re building an application that intends to use the Web Browser control for a live preview of some HTML this is clearly undesirable. Feature Delegation via Registry Hacks Fortunately starting with Internet Explore 8 and later there’s a fix for this problem via a registry setting. You can specify a registry key to specify which rendering mode and version of IE should be used by that application. These are not global mind you – they have to be enabled for each application individually. There are two different sets of keys for 32 bit and 64 bit applications. 32 bit: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer\MAIN\FeatureControl\FEATURE_BROWSER_EMULATION Value Key: yourapplication.exe 64 bit: HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Internet Explorer\MAIN\FeatureControl\FEATURE_BROWSER_EMULATION Value Key: yourapplication.exe The value to set this key to is (taken from MSDN here) as decimal values: 9999 (0x270F) Internet Explorer 9. Webpages are displayed in IE9 Standards mode, regardless of the !DOCTYPE directive. 9000 (0x2328) Internet Explorer 9. Webpages containing standards-based !DOCTYPE directives are displayed in IE9 mode. 8888 (0x22B8) Webpages are displayed in IE8 Standards mode, regardless of the !DOCTYPE directive. 8000 (0x1F40) Webpages containing standards-based !DOCTYPE directives are displayed in IE8 mode. 7000 (0x1B58) Webpages containing standards-based !DOCTYPE directives are displayed in IE7 Standards mode. The added key looks something like this in the Registry Editor: With this in place my Html Html Help Builder application which has wwhelp.exe as its main executable now works with HTML 5 and CSS 3 documents in the same way that Internet Explorer 9 does. Incidentally I accidentally added an ‘empty’ DWORD value of 0 to my EXE name and that worked as well giving me IE 9 rendering. Although not documented I suspect 0 (or an invalid value) will default to the installed browser. Don’t have a good way to test this but if somebody could try this with IE 8 installed that would be great: What happens when setting 9000 with IE 8 installed? What happens when setting 0 with IE 8 installed? Don’t forget to add Keys for Host Environments If you’re developing your application in Visual Studio and you run the debugger you may find that your application is still not rendering right, but if you run the actual generated EXE from Explorer or the OS command prompt it works. That’s because when you run the debugger in Visual Studio it wraps your application into a debugging host container. For this reason you might want to also add another registry key for yourapp.vshost.exe on your development machine. If you’re developing in Visual FoxPro make sure you add a key for vfp9.exe to see the rendering adjustments in the Visual FoxPro development environment. Cleaner HTML - no more HTML mangling! There are a number of additional benefits to setting up rendering of the Web Browser control to the IE 9 engine (or even the IE 8 engine) beyond the obvious rendering functionality. IE 9 actually returns your HTML in something that resembles the original HTML formatting, as opposed to the IE 7 default format which mangled the original HTML content. If you do the following in the WPF application: private void button2_Click(object sender, RoutedEventArgs e) { dynamic doc = this.webBrowser.Document; MessageBox.Show(doc.body.outerHtml); } you get different output depending on the rendering mode active. With the default IE 7 rendering you get: <BODY><DIV> <H1>Rounded Corners and Shadows - Creating Dialogs in CSS</H1> <DIV class=toolbarcontainer><A class=hoverbutton href="./"><IMG src="../../css/images/home.gif"> Home</A> <A class=hoverbutton href="RoundedCornersAndShadows.htm"><IMG src="../../css/images/refresh.gif"> Refresh</A> </DIV> <DIV class=containercontent> <FIELDSET><LEGEND>Plain Box</LEGEND> <DIV style="BORDER-BOTTOM: steelblue 2px solid; BORDER-LEFT: steelblue 2px solid; WIDTH: 550px; BORDER-TOP: steelblue 2px solid; BORDER-RIGHT: steelblue 2px solid" class="roundbox boxshadow"> <DIV style="BACKGROUND: khaki" class="boxcontenttext roundbox">Simple Rounded Corner Box. </DIV></DIV></FIELDSET> <FIELDSET><LEGEND>Box with Header</LEGEND> <DIV style="BORDER-BOTTOM: steelblue 2px solid; BORDER-LEFT: steelblue 2px solid; WIDTH: 550px; BORDER-TOP: steelblue 2px solid; BORDER-RIGHT: steelblue 2px solid" class="roundbox boxshadow"> <DIV class="gridheaderleft roundbox-top">Box with a Header</DIV> <DIV style="BACKGROUND: khaki" class="boxcontenttext roundbox-bottom">Simple Rounded Corner Box. </DIV></DIV></FIELDSET> <FIELDSET><LEGEND>Dialog Style Window</LEGEND> <DIV style="POSITION: relative; WIDTH: 450px" id=divDialog class="dialog boxshadow" jQuery16107208195684204002="2"> <DIV style="POSITION: relative" class=dialog-header> <DIV class=closebox></DIV>User Sign-in <DIV class=closebox jQuery16107208195684204002="3"></DIV></DIV> <DIV class=descriptionheader>This dialog is draggable and closable</DIV> <DIV class=dialog-content><LABEL>Username:</LABEL> <INPUT name=txtUsername value=" "> <LABEL>Password</LABEL> <INPUT name=txtPassword value=" "> <HR> <INPUT id=btnLogin value=Login type=button> </DIV> <DIV class=dialog-statusbar>Ready</DIV></DIV></FIELDSET> </DIV> <SCRIPT type=text/javascript> $(document).ready(function () { $("#divDialog") .draggable({ handle: ".dialog-header" }) .closable({ handle: ".dialog-header", closeHandler: function () { alert("Window about to be closed."); return true; // true closes - false leaves open } }); }); </SCRIPT> </DIV></BODY> Now lest you think I’m out of my mind and create complete whacky HTML rooted in the last century, here’s the IE 9 rendering mode output which looks a heck of a lot cleaner and a lot closer to my original HTML of the page I’m accessing: <body> <div> <h1>Rounded Corners and Shadows - Creating Dialogs in CSS</h1> <div class="toolbarcontainer"> <a class="hoverbutton" href="./"> <img src="../../css/images/home.gif"> Home</a> <a class="hoverbutton" href="RoundedCornersAndShadows.htm"> <img src="../../css/images/refresh.gif"> Refresh</a> </div> <div class="containercontent"> <fieldset> <legend>Plain Box</legend>  <div style="border: 2px solid steelblue; width: 550px;" class="roundbox boxshadow"> <div style="background: khaki;" class="boxcontenttext roundbox"> Simple Rounded Corner Box. </div> </div> </fieldset> <fieldset> <legend>Box with Header</legend> <div style="border: 2px solid steelblue; width: 550px;" class="roundbox boxshadow"> <div class="gridheaderleft roundbox-top">Box with a Header</div> <div style="background: khaki;" class="boxcontenttext roundbox-bottom"> Simple Rounded Corner Box. </div> </div> </fieldset> <fieldset> <legend>Dialog Style Window</legend> <div style="width: 450px; position: relative;" id="divDialog" class="dialog boxshadow"> <div style="position: relative;" class="dialog-header"> <div class="closebox"></div> User Sign-in <div class="closebox"></div></div> <div class="descriptionheader">This dialog is draggable and closable</div> <div class="dialog-content"> <label>Username:</label> <input name="txtUsername" value=" " type="text"> <label>Password</label> <input name="txtPassword" value=" " type="text"> <hr/> <input id="btnLogin" value="Login" type="button"> </div> <div class="dialog-statusbar">Ready</div> </div> </fieldset> </div> <script type="text/javascript"> $(document).ready(function () { $("#divDialog") .draggable({ handle: ".dialog-header" }) .closable({ handle: ".dialog-header", closeHandler: function () { alert("Window about to be closed."); return true; // true closes - false leaves open } }); }); </script> </div> </body> IOW, in IE9 rendering mode IE9 is much closer (but not identical) to the original HTML from the page on the Web that we’re reading from. As a side note: Unfortunately, the browser feature emulation can't be applied against the Html Help (CHM) Engine in Windows which uses the Web Browser control (or COM interfaces anyway) to render Html Help content. I tried setting up hh.exe which is the help viewer, to use IE 9 rendering but a help file generated with CSS3 features will simply show in IE 7 mode. Bummer - this would have been a nice quick fix to allow help content served from CHM files to look better. HTML Editing leaves HTML formatting intact In the same vane, if you do any inline HTML editing in the control by setting content to be editable, IE 9’s control does a much more reasonable job of creating usable and somewhat valid HTML. It also leaves the original content alone other than the text your are editing or adding. No longer is the HTML output stripped of excess spaces and reformatted in IEs format. So if I do: private void button3_Click(object sender, RoutedEventArgs e) { dynamic doc = this.webBrowser.Document; doc.body.contentEditable = true; } and then make some changes to the document by typing into it using IE 9 mode, the document formatting stays intact and only the affected content is modified. The created HTML is reasonably clean (although it does lack proper XHTML formatting for things like <br/> <hr/>). This is very different from IE 7 mode which mangled the HTML as soon as the page was loaded into the control. Any editing you did stripped out all white space and lost all of your existing XHTML formatting. In IE 9 mode at least *most* of your original formatting stays intact. This is huge! In Html Help Builder I have supported HTML editing for a long time but the HTML mangling by the Web Browser control made it very difficult to edit the HTML later. Previously IE would mangle the HTML by stripping out spaces, upper casing all tags and converting many XHTML safe tags to its HTML 3 tags. Now IE leaves most of my document alone while editing, and creates cleaner and more compliant markup (with exception of self-closing elements like BR/HR). The end result is that I now have HTML editing in place that's much cleaner and actually capable of being manually edited. Caveats, Caveats, Caveats It wouldn't be Internet Explorer if there weren't some major compatibility issues involved in using this various browser version interaction. The biggest thing I ran into is that there are odd differences in some of the COM interfaces and what they return. I specifically ran into a problem with the document.selection.createRange() function which with IE 7 compatibility returns an expected text range object. When running in IE 8 or IE 9 mode however. I could not retrieve a valid text range with this code where loEdit is the WebBrowser control: loRange = loEdit.document.selection.CreateRange() The loRange object returned (here in FoxPro) had a length property of 0 but none of the other properties of the TextRange or TextRangeCollection objects were available. I figured this was due to some changed security settings but even after elevating the Intranet Security Zone and mucking with the other browser feature flags pertaining to security I had no luck. In the end I relented and used a JavaScript function in my editor document that returns a selection range object: function getselectionrange() { var range = document.selection.createRange(); return range; } and call that JavaScript function from my host applications code: *** Use a function in the document to get around HTML Editing issues loRange = loEdit.document.parentWindow.getselectionrange(.f.) and that does work correctly. This wasn't a big deal as I'm already loading a support script file into the editor page so all I had to do is add the function to this existing script file. You can find out more how to call script code in the Web Browser control from a host application in a previous post of mine. IE 8 and 9 also clamp down the security environment a little more than the default IE 7 control, so there may be other issues you run into. Other than the createRange() problem above I haven't seen anything else that is breaking in my code so far though and that's encouraging at least since it uses a lot of HTML document manipulation for the custom editor I've created (and would love to replace - any PROFESSIONAL alternatives anybody?) Registry Key Installation for your Application It’s important to remember that this registry setting is made per application, so most likely this is something you want to set up with your installer. Also remember that 32 and 64 bit settings require separate settings in the registry so if you’re creating your installer you most likely will want to set both keys in the registry preemptively for your application. I use Tarma Installer for all of my application installs and in Tarma I configure registry keys for both and set a flag to only install the latter key group in the 64 bit version: Because this setting is application specific you have to do this for every application you install unfortunately, but this also means that you can safely configure this setting in the registry because it is after only applied to your application. Another problem with install based installation is version detection. If IE 8 is installed I’d want 8000 for the value, if IE 9 is installed I want 9000. I can do this easily in code but in the installer this is much more difficult. I don’t have a good solution for this at the moment, but given that the app works with IE 7 mode now, IE 9 mode is just a bonus for the moment. If IE 9 is not installed and 9000 is used the default rendering will remain in use. It sure would be nice if we could specify the IE rendering mode as a property, but I suspect the ActiveX container has to know before it loads what actual version to load up and once loaded can only load a single version of IE. This would account for this annoying application level configuration… Summary The registry feature emulation has been available for quite some time, but I just found out about it today and started experimenting around with it. I’m stoked to see that this is available as I’d pretty much given up in ever seeing any better rendering in the Web Browser control. Now at least my apps can take advantage of newer HTML features. Now if we could only get better HTML Editing support somehow <snicker>… ah can’t have everything.© Rick Strahl, West Wind Technologies, 2005-2011Posted in .NET FoxPro Windows

Read the article

Using HTML 5 SessionState to save rendered Page Content

- by Rick Strahl

HTML 5 SessionState and LocalStorage are very useful and super easy to use to manage client side state. For building rich client side or SPA style applications it's a vital feature to be able to cache user data as well as HTML content in order to swap pages in and out of the browser's DOM. What might not be so obvious is that you can also use the sessionState and localStorage objects even in classic server rendered HTML applications to provide caching features between pages. These APIs have been around for a long time and are supported by most relatively modern browsers and even all the way back to IE8, so you can use them safely in your Web applications. SessionState and LocalStorage are easy The APIs that make up sessionState and localStorage are very simple. Both object feature the same API interface which is a simple, string based key value store that has getItem, setItem, removeitem, clear and key methods. The objects are also pseudo array objects and so can be iterated like an array with a length property and you have array indexers to set and get values with. Basic usage for storing and retrieval looks like this (using sessionStorage, but the syntax is the same for localStorage - just switch the objects):// set var lastAccess = new Date().getTime(); if (sessionStorage) sessionStorage.setItem("myapp_time", lastAccess.toString()); // retrieve in another page or on a refresh var time = null; if (sessionStorage) time = sessionStorage.getItem("myapp_time"); if (time) time = new Date(time * 1); else time = new Date(); sessionState stores data that is browser session specific and that has a liftetime of the active browser session or window. Shut down the browser or tab and the storage goes away. localStorage uses the same API interface, but the lifetime of the data is permanently stored in the browsers storage area until deleted via code or by clearing out browser cookies (not the cache). Both sessionStorage and localStorage space is limited. The spec is ambiguous about this - supposedly sessionStorage should allow for unlimited size, but it appears that most WebKit browsers support only 2.5mb for either object. This means you have to be careful what you store especially since other applications might be running on the same domain and also use the storage mechanisms. That said 2.5mb worth of character data is quite a bit and would go a long way. The easiest way to get a feel for how sessionState and localStorage work is to look at a simple example. You can go check out the following example online in Plunker: http://plnkr.co/edit/0ICotzkoPjHaWa70GlRZ?p=preview which looks like this: Plunker is an online HTML/JavaScript editor that lets you write and run Javascript code and similar to JsFiddle, but a bit cleaner to work in IMHO (thanks to John Papa for turning me on to it). The sample has two text boxes with counts that update session/local storage every time you click the related button. The counts are 'cached' in Session and Local storage. The point of these examples is that both counters survive full page reloads, and the LocalStorage counter survives a complete browser shutdown and restart. Go ahead and try it out by clicking the Reload button after updating both counters and then shutting down the browser completely and going back to the same URL (with the same browser). What you should see is that reloads leave both counters intact at the counted values, while a browser restart will leave only the local storage counter intact. The code to deal with the SessionStorage (and LocalStorage not shown here) in the example is isolated into a couple of wrapper methods to simplify the code: function getSessionCount() { var count = 0; if (sessionStorage) { var count = sessionStorage.getItem("ss_count"); count = !count ? 0 : count * 1; } $("#txtSession").val(count); return count; } function setSessionCount(count) { if (sessionStorage) sessionStorage.setItem("ss_count", count.toString()); } These two functions essentially load and store a session counter value. The two key methods used here are: sessionStorage.getItem(key); sessionStorage.setItem(key,stringVal); Note that the value given to setItem and return by getItem has to be a string. If you pass another type you get an error. Don't let that limit you though - you can easily enough store JSON data in a variable so it's quite possible to pass complex objects and store them into a single sessionStorage value:var user = { name: "Rick", id="ricks", level=8 } sessionStorage.setItem("app_user",JSON.stringify(user)); to retrieve it:var user = sessionStorage.getItem("app_user"); if (user) user = JSON.parse(user); Simple! If you're using the Chrome Developer Tools (F12) you can also check out the session and local storage state on the Resource tab: You can also use this tool to refresh or remove entries from storage. What we just looked at is a purely client side implementation where a couple of counters are stored. For rich client centric AJAX applications sessionStorage and localStorage provide a very nice and simple API to store application state while the application is running. But you can also use these storage mechanisms to manage server centric HTML applications when you combine server rendering with some JavaScript to perform client side data caching. You can both store some state information and data on the client (ie. store a JSON object and carry it forth between server rendered HTML requests) or you can use it for good old HTTP based caching where some rendered HTML is saved and then restored later. Let's look at the latter with a real life example. Why do I need Client-side Page Caching for Server Rendered HTML? I don't know about you, but in a lot of my existing server driven applications I have lists that display a fair amount of data. Typically these lists contain links to then drill down into more specific data either for viewing or editing. You can then click on a link and go off to a detail page that provides more concise content. So far so good. But now you're done with the detail page and need to get back to the list, so you click on a 'bread crumbs trail' or an application level 'back to list' button and… …you end up back at the top of the list - the scroll position, the current selection in some cases even filters conditions - all gone with the wind. You've left behind the state of the list and are starting from scratch in your browsing of the list from the top. Not cool! Sound familiar? This a pretty common scenario with server rendered HTML content where it's so common to display lists to drill into, only to lose state in the process of returning back to the original list. Look at just about any traditional forums application, or even StackOverFlow to see what I mean here. Scroll down a bit to look at a post or entry, drill in then use the bread crumbs or tab to go back… In some cases returning to the top of a list is not a big deal. On StackOverFlow that sort of works because content is turning around so quickly you probably want to actually look at the top posts. Not always though - if you're browsing through a list of search topics you're interested in and drill in there's no way back to that position. Essentially anytime you're actively browsing the items in the list, that's when state becomes important and if it's not handled the user experience can be really disrupting. Content Caching If you're building client centric SPA style applications this is a fairly easy to solve problem - you tend to render the list once and then update the page content to overlay the detail content, only hiding the list temporarily until it's used again later. It's relatively easy to accomplish this simply by hiding content on the page and later making it visible again. But if you use server rendered content, hanging on to all the detail like filters, selections and scroll position is not quite as easy. Or is it??? This is where sessionStorage comes in handy. What if we just save the rendered content of a previous page, and then restore it when we return to this page based on a special flag that tells us to use the cached version? Let's see how we can do this. A real World Use Case Recently my local ISP asked me to help out with updating an ancient classifieds application. They had a very busy, local classifieds app that was originally an ASP classic application. The old app was - wait for it: frames based - and even though I lobbied against it, the decision was made to keep the frames based layout to allow rapid browsing of the hundreds of posts that are made on a daily basis. The primary reason they wanted this was precisely for the ability to quickly browse content item by item. While I personally hate working with Frames, I have to admit that the UI actually works well with the frames layout as long as you're running on a large desktop screen. You can check out the frames based desktop site here: http://classifieds.gorge.net/ However when I rebuilt the app I also added a secondary view that doesn't use frames. The main reason for this of course was for mobile displays which work horribly with frames. So there's a somewhat mobile friendly interface to the interface, which ditches the frames and uses some responsive design tweaking for mobile capable operation: http://classifeds.gorge.net/mobile (or browse the base url with your browser width under 800px) Here's what the mobile, non-frames view looks like: As you can see this means that the list of classifieds posts now is a list and there's a separate page for drilling down into the item. And of course… originally we ran into that usability issue I mentioned earlier where the browse, view detail, go back to the list cycle resulted in lost list state. Originally in mobile mode you scrolled through the list, found an item to look at and drilled in to display the item detail. Then you clicked back to the list and BAM - you've lost your place. Because there are so many items added on a daily basis the full list is never fully loaded, but rather there's a "Load Additional Listings" entry at the button. Not only did we originally lose our place when coming back to the list, but any 'additionally loaded' items are no longer there because the list was now rendering as if it was the first page hit. The additional listings, and any filters, the selection of an item all were lost. Major Suckage! Using Client SessionStorage to cache Server Rendered Content To work around this problem I decided to cache the rendered page content from the list in SessionStorage. Anytime the list renders or is updated with Load Additional Listings, the page HTML is cached and stored in Session Storage. Any back links from the detail page or the login or write entry forms then point back to the list page with a back=true query string parameter. If the server side sees this parameter it doesn't render the part of the page that is cached. Instead the client side code retrieves the data from the sessionState cache and simply inserts it into the page. It sounds pretty simple, and the overall the process is really easy, but there are a few gotchas that I'll discuss in a minute. But first let's look at the implementation. Let's start with the server side here because that'll give a quick idea of the doc structure. As I mentioned the server renders data from an ASP.NET MVC view. On the list page when returning to the list page from the display page (or a host of other pages) looks like this: https://classifieds.gorge.net/list?back=True The query string value is a flag, that indicates whether the server should render the HTML. Here's what the top level MVC Razor view for the list page looks like:@model MessageListViewModel @{ ViewBag.Title = "Classified Listing"; bool isBack = !string.IsNullOrEmpty(Request.QueryString["back"]); } <form method="post" action="@Url.Action("list")"> <div id="SizingContainer"> @if (!isBack) { @Html.Partial("List_CommandBar_Partial", Model) <div id="PostItemContainer" class="scrollbox" xstyle="-webkit-overflow-scrolling: touch;"> @Html.Partial("List_Items_Partial", Model) @if (Model.RequireLoadEntry) { <div class="postitem loadpostitems" style="padding: 15px;"> <div id="LoadProgress" class="smallprogressright"></div> <div class="control-progress"> Load additional listings... </div> </div> } </div> } </div> </form> As you can see the query string triggers a conditional block that if set is simply not rendered. The content inside of #SizingContainer basically holds the entire page's HTML sans the headers and scripts, but including the filter options and menu at the top. In this case this makes good sense - in other situations the fact that the menu or filter options might be dynamically updated might make you only cache the list rather than essentially the entire page. In this particular instance all of the content works and produces the proper result as both the list along with any filter conditions in the form inputs are restored. Ok, let's move on to the client. On the client there are two page level functions that deal with saving and restoring state. Like the counter example I showed earlier, I like to wrap the logic to save and restore values from sessionState into a separate function because they are almost always used in several places.page.saveData = function(id) { if (!sessionStorage) return; var data = { id: id, scroll: $("#PostItemContainer").scrollTop(), html: $("#SizingContainer").html() }; sessionStorage.setItem("list_html",JSON.stringify(data)); }; page.restoreData = function() { if (!sessionStorage) return; var data = sessionStorage.getItem("list_html"); if (!data) return null; return JSON.parse(data); }; The data that is saved is an object which contains an ID which is the selected element when the user clicks and a scroll position. These two values are used to reset the scroll position when the data is used from the cache. Finally the html from the #SizingContainer element is stored, which makes for the bulk of the document's HTML. In this application the HTML captured could be a substantial bit of data. If you recall, I mentioned that the server side code renders a small chunk of data initially and then gets more data if the user reads through the first 50 or so items. The rest of the items retrieved can be rather sizable. Other than the JSON deserialization that's Ok. Since I'm using SessionStorage the storage space has no immediate limits. Next is the core logic to handle saving and restoring the page state. At first though this would seem pretty simple, and in some cases it might be, but as the following code demonstrates there are a few gotchas to watch out for. Here's the relevant code I use to save and restore:$( function() { … var isBack = getUrlEncodedKey("back", location.href); if (isBack) { // remove the back key from URL setUrlEncodedKey("back", "", location.href); var data = page.restoreData(); // restore from sessionState if (!data) { // no data - force redisplay of the server side default list window.location = "list"; return; } $("#SizingContainer").html(data.html); var el = $(".postitem[data-id=" + data.id + "]"); $(".postitem").removeClass("highlight"); el.addClass("highlight"); $("#PostItemContainer").scrollTop(data.scroll); setTimeout(function() { el.removeClass("highlight"); }, 2500); } else if (window.noFrames) page.saveData(null); // save when page loads $("#SizingContainer").on("click", ".postitem", function() { var id = $(this).attr("data-id"); if (!id) return true; if (window.noFrames) page.saveData(id); var contentFrame = window.parent.frames["Content"]; if (contentFrame) contentFrame.location.href = "show/" + id; else window.location.href = "show/" + id; return false; }); … The code starts out by checking for the back query string flag which triggers restoring from the client cache. If cached the cached data structure is read from sessionStorage. It's important here to check if data was returned. If the user had back=true on the querystring but there is no cached data, he likely bookmarked this page or otherwise shut down the browser and came back to this URL. In that case the server didn't render any detail and we have no cached data, so all we can do is redirect to the original default list view using window.location. If we continued the page would render no data - so make sure to always check the cache retrieval result. Always! If there is data the it's loaded and the data.html data is restored back into the document by simply injecting the HTML back into the document's #SizingContainer element:$("#SizingContainer").html(data.html); It's that simple and it's quite quick even with a fully loaded list of additional items and on a phone. The actual HTML data is stored to the cache on every page load initially and then again when the user clicks on an element to navigate to a particular listing. The former ensures that the client cache always has something in it, and the latter updates with additional information for the selected element. For the click handling I use a data-id attribute on the list item (.postitem) in the list and retrieve the id from that. That id is then used to navigate to the actual entry as well as storing that Id value in the saved cached data. The id is used to reset the selection by searching for the data-id value in the restored elements. The overall process of this save/restore process is pretty straight forward and it doesn't require a bunch of code, yet it yields a huge improvement in the usability of the site on mobile devices (or anybody who uses the non-frames view). Some things to watch out for As easy as it conceptually seems to simply store and retrieve cached content, you have to be quite aware what type of content you are caching. The code above is all that's specific to cache/restore cycle and it works, but it took a few tweaks to the rest of the script code and server code to make it all work. There were a few gotchas that weren't immediately obvious. Here are a few things to pay attention to: Event Handling Logic Timing of manipulating DOM events Inline Script Code Bookmarking to the Cache Url when no cache exists Do you have inline script code in your HTML? That script code isn't going to run if you restore from cache and simply assign or it may not run at the time you think it would normally in the DOM rendering cycle. JavaScript Event Hookups The biggest issue I ran into with this approach almost immediately is that originally I had various static event handlers hooked up to various UI elements that are now cached. If you have an event handler like:$("#btnSearch").click( function() {…}); that works fine when the page loads with server rendered HTML, but that code breaks when you now load the HTML from cache. Why? Because the elements you're trying to hook those events to may not actually be there - yet. Luckily there's an easy workaround for this by using deferred events. With jQuery you can use the .on() event handler instead:$("#SelectionContainer").on("click","#btnSearch", function() {…}); which monitors a parent element for the events and checks for the inner selector elements to handle events on. This effectively defers to runtime event binding, so as more items are added to the document bindings still work. For any cached content use deferred events. Timing of manipulating DOM Elements Along the same lines make sure that your DOM manipulation code follows the code that loads the cached content into the page so that you don't manipulate DOM elements that don't exist just yet. Ideally you'll want to check for the condition to restore cached content towards the top of your script code, but that can be tricky if you have components or other logic that might not all run in a straight line. Inline Script Code Here's another small problem I ran into: I use a DateTime Picker widget I built a while back that relies on the jQuery date time picker. I also created a helper function that allows keyboard date navigation into it that uses JavaScript logic. Because MVC's limited 'object model' the only way to embed widget content into the page is through inline script. This code broken when I inserted the cached HTML into the page because the script code was not available when the component actually got injected into the page. As the last bullet - it's a matter of timing. There's no good work around for this - in my case I pulled out the jQuery date picker and relied on native <input type="date" /> logic instead - a better choice these days anyway, especially since this view is meant to be primarily to serve mobile devices which actually support date input through the browser (unlike desktop browsers of which only WebKit seems to support it). Bookmarking Cached Urls When you cache HTML content you have to make a decision whether you cache on the client and also not render that same content on the server. In the Classifieds app I didn't render server side content so if the user comes to the page with back=True and there is no cached content I have to a have a Plan B. Typically this happens when somebody ends up bookmarking the back URL. The easiest and safest solution for this scenario is to ALWAYS check the cache result to make sure it exists and if not have a safe URL to go back to - in this case to the plain uncached list URL which amounts to effectively redirecting. This seems really obvious in hindsight, but it's easy to overlook and not see a problem until much later, when it's not obvious at all why the page is not rendering anything. Don't use <body> to replace Content Since we're practically replacing all the HTML in the page it may seem tempting to simply replace the HTML content of the <body> tag. Don't. The body tag usually contains key things that should stay in the page and be there when it loads. Specifically script tags and elements and possibly other embedded content. It's best to create a top level DOM element specifically as a placeholder container for your cached content and wrap just around the actual content you want to replace. In the app above the #SizingContainer is that container. Other Approaches The approach I've used for this application is kind of specific to the existing server rendered application we're running and so it's just one approach you can take with caching. However for server rendered content caching this is a pattern I've used in a few apps to retrofit some client caching into list displays. In this application I took the path of least resistance to the existing server rendering logic. Here are a few other ways that come to mind: Using Partial HTML Rendering via AJAXInstead of rendering the page initially on the server, the page would load empty and the client would render the UI by retrieving the respective HTML and embedding it into the page from a Partial View. This effectively makes the initial rendering and the cached rendering logic identical and removes the server having to decide whether this request needs to be rendered or not (ie. not checking for a back=true switch). All the logic related to caching is made on the client in this case. Using JSON Data and Client RenderingThe hardcore client option is to do the whole UI SPA style and pull data from the server and then use client rendering or databinding to pull the data down and render using templates or client side databinding with knockout/angular et al. As with the Partial Rendering approach the advantage is that there's no difference in the logic between pulling the data from cache or rendering from scratch other than the initial check for the cache request. Of course if the app is a full on SPA app, then caching may not be required even - the list could just stay in memory and be hidden and reactivated. I'm sure there are a number of other ways this can be handled as well especially using AJAX. AJAX rendering might simplify the logic, but it also complicates search engine optimization since there's no content loaded initially. So there are always tradeoffs and it's important to look at all angles before deciding on any sort of caching solution in general. State of the Session SessionState and LocalStorage are easy to use in client code and can be integrated even with server centric applications to provide nice caching features of content and data. In this post I've shown a very specific scenario of storing HTML content for the purpose of remembering list view data and state and making the browsing experience for lists a bit more friendly, especially if there's dynamically loaded content involved. If you haven't played with sessionStorage or localStorage I encourage you to give it a try. There's a lot of cool stuff that you can do with this beyond the specific scenario I've covered here… Resources Overview of localStorage (also applies to sessionStorage) Web Storage Compatibility Modernizr Test Suite© Rick Strahl, West Wind Technologies, 2005-2013Posted in JavaScript HTML5 ASP.NET MVC Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article

An Introduction to ASP.NET Web API

- by Rick Strahl

Microsoft recently released ASP.NET MVC 4.0 and .NET 4.5 and along with it, the brand spanking new ASP.NET Web API. Web API is an exciting new addition to the ASP.NET stack that provides a new, well-designed HTTP framework for creating REST and AJAX APIs (API is Microsoft’s new jargon for a service, in case you’re wondering). Although Web API ships and installs with ASP.NET MVC 4, you can use Web API functionality in any ASP.NET project, including WebForms, WebPages and MVC or just a Web API by itself. And you can also self-host Web API in your own applications from Console, Desktop or Service applications. If you're interested in a high level overview on what ASP.NET Web API is and how it fits into the ASP.NET stack you can check out my previous post: Where does ASP.NET Web API fit? In the following article, I'll focus on a practical, by example introduction to ASP.NET Web API. All the code discussed in this article is available in GitHub: https://github.com/RickStrahl/AspNetWebApiArticle [republished from my Code Magazine Article and updated for RTM release of ASP.NET Web API] Getting Started To start I’ll create a new empty ASP.NET application to demonstrate that Web API can work with any kind of ASP.NET project. Although you can create a new project based on the ASP.NET MVC/Web API template to quickly get up and running, I’ll take you through the manual setup process, because one common use case is to add Web API functionality to an existing ASP.NET application. This process describes the steps needed to hook up Web API to any ASP.NET 4.0 application. Start by creating an ASP.NET Empty Project. Then create a new folder in the project called Controllers. Add a Web API Controller Class Once you have any kind of ASP.NET project open, you can add a Web API Controller class to it. Web API Controllers are very similar to MVC Controller classes, but they work in any kind of project. Add a new item to this folder by using the Add New Item option in Visual Studio and choose Web API Controller Class, as shown in Figure 1. Figure 1: This is how you create a new Controller Class in Visual Studio Make sure that the name of the controller class includes Controller at the end of it, which is required in order for Web API routing to find it. Here, the name for the class is AlbumApiController. For this example, I’ll use a Music Album model to demonstrate basic behavior of Web API. The model consists of albums and related songs where an album has properties like Name, Artist and YearReleased and a list of songs with a SongName and SongLength as well as an AlbumId that links it to the album. You can find the code for the model (and the rest of these samples) on Github. To add the file manually, create a new folder called Model, and add a new class Album.cs and copy the code into it. There’s a static AlbumData class with a static CreateSampleAlbumData() method that creates a short list of albums on a static .Current that I’ll use for the examples. Before we look at what goes into the controller class though, let’s hook up routing so we can access this new controller. Hooking up Routing in Global.asax To start, I need to perform the one required configuration task in order for Web API to work: I need to configure routing to the controller. Like MVC, Web API uses routing to provide clean, extension-less URLs to controller methods. Using an extension method to ASP.NET’s static RouteTable class, you can use the MapHttpRoute() (in the System.Web.Http namespace) method to hook-up the routing during Application_Start in global.asax.cs shown in Listing 1.using System; using System.Web.Routing; using System.Web.Http; namespace AspNetWebApi { public class Global : System.Web.HttpApplication { protected void Application_Start(object sender, EventArgs e) { RouteTable.Routes.MapHttpRoute( name: "AlbumVerbs", routeTemplate: "albums/{title}", defaults: new { symbol = RouteParameter.Optional, controller="AlbumApi" } ); } } } This route configures Web API to direct URLs that start with an albums folder to the AlbumApiController class. Routing in ASP.NET is used to create extensionless URLs and allows you to map segments of the URL to specific Route Value parameters. A route parameter, with a name inside curly brackets like {name}, is mapped to parameters on the controller methods. Route parameters can be optional, and there are two special route parameters – controller and action – that determine the controller to call and the method to activate respectively. HTTP Verb Routing Routing in Web API can route requests by HTTP Verb in addition to standard {controller},{action} routing. For the first examples, I use HTTP Verb routing, as shown Listing 1. Notice that the route I’ve defined does not include an {action} route value or action value in the defaults. Rather, Web API can use the HTTP Verb in this route to determine the method to call the controller, and a GET request maps to any method that starts with Get. So methods called Get() or GetAlbums() are matched by a GET request and a POST request maps to a Post() or PostAlbum(). Web API matches a method by name and parameter signature to match a route, query string or POST values. In lieu of the method name, the [HttpGet,HttpPost,HttpPut,HttpDelete, etc] attributes can also be used to designate the accepted verbs explicitly if you don’t want to follow the verb naming conventions. Although HTTP Verb routing is a good practice for REST style resource APIs, it’s not required and you can still use more traditional routes with an explicit {action} route parameter. When {action} is supplied, the HTTP verb routing is ignored. I’ll talk more about alternate routes later. When you’re finished with initial creation of files, your project should look like Figure 2. Figure 2: The initial project has the new API Controller Album model Creating a small Album Model Now it’s time to create some controller methods to serve data. For these examples, I’ll use a very simple Album and Songs model to play with, as shown in Listing 2. public class Song { public string AlbumId { get; set; } [Required, StringLength(80)] public string SongName { get; set; } [StringLength(5)] public string SongLength { get; set; } } public class Album { public string Id { get; set; } [Required, StringLength(80)] public string AlbumName { get; set; } [StringLength(80)] public string Artist { get; set; } public int YearReleased { get; set; } public DateTime Entered { get; set; } [StringLength(150)] public string AlbumImageUrl { get; set; } [StringLength(200)] public string AmazonUrl { get; set; } public virtual List<Song> Songs { get; set; } public Album() { Songs = new List<Song>(); Entered = DateTime.Now; // Poor man's unique Id off GUID hash Id = Guid.NewGuid().GetHashCode().ToString("x"); } public void AddSong(string songName, string songLength = null) { this.Songs.Add(new Song() { AlbumId = this.Id, SongName = songName, SongLength = songLength }); } } Once the model has been created, I also added an AlbumData class that generates some static data in memory that is loaded onto a static .Current member. The signature of this class looks like this and that's what I'll access to retrieve the base data:public static class AlbumData { // sample data - static list public static List<Album> Current = CreateSampleAlbumData(); /// <summary> /// Create some sample data /// </summary> /// <returns></returns> public static List<Album> CreateSampleAlbumData() { … }} You can check out the full code for the data generation online. Creating an AlbumApiController Web API shares many concepts of ASP.NET MVC, and the implementation of your API logic is done by implementing a subclass of the System.Web.Http.ApiController class. Each public method in the implemented controller is a potential endpoint for the HTTP API, as long as a matching route can be found to invoke it. The class name you create should end in Controller, which is how Web API matches the controller route value to figure out which class to invoke. Inside the controller you can implement methods that take standard .NET input parameters and return .NET values as results. Web API’s binding tries to match POST data, route values, form values or query string values to your parameters. Because the controller is configured for HTTP Verb based routing (no {action} parameter in the route), any methods that start with Getxxxx() are called by an HTTP GET operation. You can have multiple methods that match each HTTP Verb as long as the parameter signatures are different and can be matched by Web API. In Listing 3, I create an AlbumApiController with two methods to retrieve a list of albums and a single album by its title .public class AlbumApiController : ApiController { public IEnumerable<Album> GetAlbums() { var albums = AlbumData.Current.OrderBy(alb => alb.Artist); return albums; } public Album GetAlbum(string title) { var album = AlbumData.Current .SingleOrDefault(alb => alb.AlbumName.Contains(title)); return album; }} To access the first two requests, you can use the following URLs in your browser: http://localhost/aspnetWebApi/albumshttp://localhost/aspnetWebApi/albums/Dirty%20Deeds Note that you’re not specifying the actions of GetAlbum or GetAlbums in these URLs. Instead Web API’s routing uses HTTP GET verb to route to these methods that start with Getxxx() with the first mapping to the parameterless GetAlbums() method and the latter to the GetAlbum(title) method that receives the title parameter mapped as optional in the route. Content Negotiation When you access any of the URLs above from a browser, you get either an XML or JSON result returned back. The album list result for Chrome 17 and Internet Explorer 9 is shown Figure 3. Figure 3: Web API responses can vary depending on the browser used, demonstrating Content Negotiation in action as these two browsers send different HTTP Accept headers. Notice that the results are not the same: Chrome returns an XML response and IE9 returns a JSON response. Whoa, what’s going on here? Shouldn’t we see the same result in both browsers? Actually, no. Web API determines what type of content to return based on Accept headers. HTTP clients, like browsers, use Accept headers to specify what kind of content they’d like to see returned. Browsers generally ask for HTML first, followed by a few additional content types. Chrome (and most other major browsers) ask for: Accept: text/html, application/xhtml+xml,application/xml; q=0.9,*/*;q=0.8 IE9 asks for: Accept: text/html, application/xhtml+xml, */* Note that Chrome’s Accept header includes application/xml, which Web API finds in its list of supported media types and returns an XML response. IE9 does not include an Accept header type that works on Web API by default, and so it returns the default format, which is JSON. This is an important and very useful feature that was missing from any previous Microsoft REST tools: Web API automatically switches output formats based on HTTP Accept headers. Nowhere in the server code above do you have to explicitly specify the output format. Rather, Web API determines what format the client is requesting based on the Accept headers and automatically returns the result based on the available formatters. This means that a single method can handle both XML and JSON results.. Using this simple approach makes it very easy to create a single controller method that can return JSON, XML, ATOM or even OData feeds by providing the appropriate Accept header from the client. By default you don’t have to worry about the output format in your code. Note that you can still specify an explicit output format if you choose, either globally by overriding the installed formatters, or individually by returning a lower level HttpResponseMessage instance and setting the formatter explicitly. More on that in a minute. Along the same lines, any content sent to the server via POST/PUT is parsed by Web API based on the HTTP Content-type of the data sent. The same formats allowed for output are also allowed on input. Again, you don’t have to do anything in your code – Web API automatically performs the deserialization from the content. Accessing Web API JSON Data with jQuery A very common scenario for Web API endpoints is to retrieve data for AJAX calls from the Web browser. Because JSON is the default format for Web API, it’s easy to access data from the server using jQuery and its getJSON() method. This example receives the albums array from GetAlbums() and databinds it into the page using knockout.js.$.getJSON("albums/", function (albums) { // make knockout template visible $(".album").show(); // create view object and attach array var view = { albums: albums }; ko.applyBindings(view); }); Figure 4 shows this and the next example’s HTML output. You can check out the complete HTML and script code at http://goo.gl/Ix33C (.html) and http://goo.gl/tETlg (.js). Figu Figure 4: The Album Display sample uses JSON data loaded from Web API. The result from the getJSON() call is a JavaScript object of the server result, which comes back as a JavaScript array. In the code, I use knockout.js to bind this array into the UI, which as you can see, requires very little code, instead using knockout’s data-bind attributes to bind server data to the UI. Of course, this is just one way to use the data – it’s entirely up to you to decide what to do with the data in your client code. Along the same lines, I can retrieve a single album to display when the user clicks on an album. The response returns the album information and a child array with all the songs. The code to do this is very similar to the last example where we pulled the albums array:$(".albumlink").live("click", function () { var id = $(this).data("id"); // title $.getJSON("albums/" + id, function (album) { ko.applyBindings(album, $("#divAlbumDialog")[0]); $("#divAlbumDialog").show(); }); }); Here the URL looks like this: /albums/Dirty%20Deeds, where the title is the ID captured from the clicked element’s data ID attribute. Explicitly Overriding Output Format When Web API automatically converts output using content negotiation, it does so by matching Accept header media types to the GlobalConfiguration.Configuration.Formatters and the SupportedMediaTypes of each individual formatter. You can add and remove formatters to globally affect what formats are available and it’s easy to create and plug in custom formatters.The example project includes a JSONP formatter that can be plugged in to provide JSONP support for requests that have a callback= querystring parameter. Adding, removing or replacing formatters is a global option you can use to manipulate content. It’s beyond the scope of this introduction to show how it works, but you can review the sample code or check out my blog entry on the subject (http://goo.gl/UAzaR). If automatic processing is not desirable in a particular Controller method, you can override the response output explicitly by returning an HttpResponseMessage instance. HttpResponseMessage is similar to ActionResult in ASP.NET MVC in that it’s a common way to return an abstract result message that contains content. HttpResponseMessage s parsed by the Web API framework using standard interfaces to retrieve the response data, status code, headers and so on[MS2] . Web API turns every response – including those Controller methods that return static results – into HttpResponseMessage instances. Explicitly returning an HttpResponseMessage instance gives you full control over the output and lets you mostly bypass WebAPI’s post-processing of the HTTP response on your behalf. HttpResponseMessage allows you to customize the response in great detail. Web API’s attention to detail in the HTTP spec really shows; many HTTP options are exposed as properties and enumerations with detailed IntelliSense comments. Even if you’re new to building REST-based interfaces, the API guides you in the right direction for returning valid responses and response codes. For example, assume that I always want to return JSON from the GetAlbums() controller method and ignore the default media type content negotiation. To do this, I can adjust the output format and headers as shown in Listing 4.public HttpResponseMessage GetAlbums() { var albums = AlbumData.Current.OrderBy(alb => alb.Artist); // Create a new HttpResponse with Json Formatter explicitly var resp = new HttpResponseMessage(HttpStatusCode.OK); resp.Content = new ObjectContent<IEnumerable<Album>>( albums, new JsonMediaTypeFormatter()); // Get Default Formatter based on Content Negotiation //var resp = Request.CreateResponse<IEnumerable<Album>>(HttpStatusCode.OK, albums); resp.Headers.ConnectionClose = true; resp.Headers.CacheControl = new CacheControlHeaderValue(); resp.Headers.CacheControl.Public = true; return resp; } This example returns the same IEnumerable<Album> value, but it wraps the response into an HttpResponseMessage so you can control the entire HTTP message result including the headers, formatter and status code. In Listing 4, I explicitly specify the formatter using the JsonMediaTypeFormatter to always force the content to JSON. If you prefer to use the default content negotiation with HttpResponseMessage results, you can create the Response instance using the Request.CreateResponse method:var resp = Request.CreateResponse<IEnumerable<Album>>(HttpStatusCode.OK, albums); This provides you an HttpResponse object that's pre-configured with the default formatter based on Content Negotiation. Once you have an HttpResponse object you can easily control most HTTP aspects on this object. What's sweet here is that there are many more detailed properties on HttpResponse than the core ASP.NET Response object, with most options being explicitly configurable with enumerations that make it easy to pick the right headers and response codes from a list of valid codes. It makes HTTP features available much more discoverable even for non-hardcore REST/HTTP geeks. Non-Serialized Results The output returned doesn’t have to be a serialized value but can also be raw data, like strings, binary data or streams. You can use the HttpResponseMessage.Content object to set a number of common Content classes. Listing 5 shows how to return a binary image using the ByteArrayContent class from a Controller method. [HttpGet] public HttpResponseMessage AlbumArt(string title) { var album = AlbumData.Current.FirstOrDefault(abl => abl.AlbumName.StartsWith(title)); if (album == null) { var resp = Request.CreateResponse<ApiMessageError>( HttpStatusCode.NotFound, new ApiMessageError("Album not found")); return resp; } // kinda silly - we would normally serve this directly // but hey - it's a demo. var http = new WebClient(); var imageData = http.DownloadData(album.AlbumImageUrl); // create response and return var result = new HttpResponseMessage(HttpStatusCode.OK); result.Content = new ByteArrayContent(imageData); result.Content.Headers.ContentType = new MediaTypeHeaderValue("image/jpeg"); return result; } The image retrieval from Amazon is contrived, but it shows how to return binary data using ByteArrayContent. It also demonstrates that you can easily return multiple types of content from a single controller method, which is actually quite common. If an error occurs - such as a resource can’t be found or a validation error – you can return an error response to the client that’s very specific to the error. In GetAlbumArt(), if the album can’t be found, we want to return a 404 Not Found status (and realistically no error, as it’s an image). Note that if you are not using HTTP Verb-based routing or not accessing a method that starts with Get/Post etc., you have to specify one or more HTTP Verb attributes on the method explicitly. Here, I used the [HttpGet] attribute to serve the image. Another option to handle the error could be to return a fixed placeholder image if no album could be matched or the album doesn’t have an image. When returning an error code, you can also return a strongly typed response to the client. For example, you can set the 404 status code and also return a custom error object (ApiMessageError is a class I defined) like this:return Request.CreateResponse<ApiMessageError>( HttpStatusCode.NotFound, new ApiMessageError("Album not found") ); If the album can be found, the image will be returned. The image is downloaded into a byte[] array, and then assigned to the result’s Content property. I created a new ByteArrayContent instance and assigned the image’s bytes and the content type so that it displays properly in the browser. There are other content classes available: StringContent, StreamContent, ByteArrayContent, MultipartContent, and ObjectContent are at your disposal to return just about any kind of content. You can create your own Content classes if you frequently return custom types and handle the default formatter assignments that should be used to send the data out . Although HttpResponseMessage results require more code than returning a plain .NET value from a method, it allows much more control over the actual HTTP processing than automatic processing. It also makes it much easier to test your controller methods as you get a response object that you can check for specific status codes and output messages rather than just a result value. Routing Again Ok, let’s get back to the image example. Using the original routing we have setup using HTTP Verb routing there's no good way to serve the image. In order to return my album art image I’d like to use a URL like this: http://localhost/aspnetWebApi/albums/Dirty%20Deeds/image In order to create a URL like this, I have to create a new Controller because my earlier routes pointed to the AlbumApiController using HTTP Verb routing. HTTP Verb based routing is great for representing a single set of resources such as albums. You can map operations like add, delete, update and read easily using HTTP Verbs. But you cannot mix action based routing into a an HTTP Verb routing controller - you can only map HTTP Verbs and each method has to be unique based on parameter signature. You can't have multiple GET operations to methods with the same signature. So GetImage(string id) and GetAlbum(string title) are in conflict in an HTTP GET routing scenario. In fact, I was unable to make the above Image URL work with any combination of HTTP Verb plus Custom routing using the single Albums controller. There are number of ways around this, but all involve additional controllers. Personally, I think it’s easier to use explicit Action routing and then add custom routes if you need to simplify your URLs further. So in order to accommodate some of the other examples, I created another controller – AlbumRpcApiController – to handle all requests that are explicitly routed via actions (/albums/rpc/AlbumArt) or are custom routed with explicit routes defined in the HttpConfiguration. I added the AlbumArt() method to this new AlbumRpcApiController class. For the image URL to work with the new AlbumRpcApiController, you need a custom route placed before the default route from Listing 1.RouteTable.Routes.MapHttpRoute( name: "AlbumRpcApiAction", routeTemplate: "albums/rpc/{action}/{title}", defaults: new { title = RouteParameter.Optional, controller = "AlbumRpcApi", action = "GetAblums" } ); Now I can use either of the following URLs to access the image: Custom route: (/albums/rpc/{title}/image)http://localhost/aspnetWebApi/albums/PowerAge/image Action route: (/albums/rpc/action/{title})http://localhost/aspnetWebAPI/albums/rpc/albumart/PowerAge Sending Data to the Server To send data to the server and add a new album, you can use an HTTP POST operation. Since I’m using HTTP Verb-based routing in the original AlbumApiController, I can implement a method called PostAlbum()to accept a new album from the client. Listing 6 shows the Web API code to add a new album.public HttpResponseMessage PostAlbum(Album album) { if (!this.ModelState.IsValid) { // my custom error class var error = new ApiMessageError() { message = "Model is invalid" }; // add errors into our client error model for client foreach (var prop in ModelState.Values) { var modelError = prop.Errors.FirstOrDefault(); if (!string.IsNullOrEmpty(modelError.ErrorMessage)) error.errors.Add(modelError.ErrorMessage); else error.errors.Add(modelError.Exception.Message); } return Request.CreateResponse<ApiMessageError>(HttpStatusCode.Conflict, error); } // update song id which isn't provided foreach (var song in album.Songs) song.AlbumId = album.Id; // see if album exists already var matchedAlbum = AlbumData.Current .SingleOrDefault(alb => alb.Id == album.Id || alb.AlbumName == album.AlbumName); if (matchedAlbum == null) AlbumData.Current.Add(album); else matchedAlbum = album; // return a string to show that the value got here var resp = Request.CreateResponse(HttpStatusCode.OK, string.Empty); resp.Content = new StringContent(album.AlbumName + " " + album.Entered.ToString(), Encoding.UTF8, "text/plain"); return resp; } The PostAlbum() method receives an album parameter, which is automatically deserialized from the POST buffer the client sent. The data passed from the client can be either XML or JSON. Web API automatically figures out what format it needs to deserialize based on the content type and binds the content to the album object. Web API uses model binding to bind the request content to the parameter(s) of controller methods. Like MVC you can check the model by looking at ModelState.IsValid. If it’s not valid, you can run through the ModelState.Values collection and check each binding for errors. Here I collect the error messages into a string array that gets passed back to the client via the result ApiErrorMessage object. When a binding error occurs, you’ll want to return an HTTP error response and it’s best to do that with an HttpResponseMessage result. In Listing 6, I used a custom error class that holds a message and an array of detailed error messages for each binding error. I used this object as the content to return to the client along with my Conflict HTTP Status Code response. If binding succeeds, the example returns a string with the name and date entered to demonstrate that you captured the data. Normally, a method like this should return a Boolean or no response at all (HttpStatusCode.NoConent). The sample uses a simple static list to hold albums, so once you’ve added the album using the Post operation, you can hit the /albums/ URL to see that the new album was added. The client jQuery code to call the POST operation from the client with jQuery is shown in Listing 7. var id = new Date().getTime().toString(); var album = { "Id": id, "AlbumName": "Power Age", "Artist": "AC/DC", "YearReleased": 1977, "Entered": "2002-03-11T18:24:43.5580794-10:00", "AlbumImageUrl": http://ecx.images-amazon.com/images/…, "AmazonUrl": http://www.amazon.com/…, "Songs": [ { "SongName": "Rock 'n Roll Damnation", "SongLength": 3.12}, { "SongName": "Downpayment Blues", "SongLength": 4.22 }, { "SongName": "Riff Raff", "SongLength": 2.42 } ] } $.ajax( { url: "albums/", type: "POST", contentType: "application/json", data: JSON.stringify(album), processData: false, beforeSend: function (xhr) { // not required since JSON is default output xhr.setRequestHeader("Accept", "application/json"); }, success: function (result) { // reload list of albums page.loadAlbums(); }, error: function (xhr, status, p3, p4) { var err = "Error"; if (xhr.responseText && xhr.responseText[0] == "{") err = JSON.parse(xhr.responseText).message; alert(err); } }); The code in Listing 7 creates an album object in JavaScript to match the structure of the .NET Album class. This object is passed to the $.ajax() function to send to the server as POST. The data is turned into JSON and the content type set to application/json so that the server knows what to convert when deserializing in the Album instance. The jQuery code hooks up success and failure events. Success returns the result data, which is a string that’s echoed back with an alert box. If an error occurs, jQuery returns the XHR instance and status code. You can check the XHR to see if a JSON object is embedded and if it is, you can extract it by de-serializing it and accessing the .message property. REST standards suggest that updates to existing resources should use PUT operations. REST standards aside, I’m not a big fan of separating out inserts and updates so I tend to have a single method that handles both. But if you want to follow REST suggestions, you can create a PUT method that handles updates by forwarding the PUT operation to the POST method:public HttpResponseMessage PutAlbum(Album album) { return PostAlbum(album); } To make the corresponding $.ajax() call, all you have to change from Listing 7 is the type: from POST to PUT. Model Binding with UrlEncoded POST Variables In the example in Listing 7 I used JSON objects to post a serialized object to a server method that accepted an strongly typed object with the same structure, which is a common way to send data to the server. However, Web API supports a number of different ways that data can be received by server methods. For example, another common way is to use plain UrlEncoded POST values to send to the server. Web API supports Model Binding that works similar (but not the same) as MVC's model binding where POST variables are mapped to properties of object parameters of the target method. This is actually quite common for AJAX calls that want to avoid serialization and the potential requirement of a JSON parser on older browsers. For example, using jQUery you might use the $.post() method to send a new album to the server (albeit one without songs) using code like the following:$.post("albums/",{AlbumName: "Dirty Deeds", YearReleased: 1976 … },albumPostCallback); Although the code looks very similar to the client code we used before passing JSON, here the data passed is URL encoded values (AlbumName=Dirty+Deeds&YearReleased=1976 etc.). Web API then takes this POST data and maps each of the POST values to the properties of the Album object in the method's parameter. Although the client code is different the server can both handle the JSON object, or the UrlEncoded POST values. Dynamic Access to POST Data There are also a few options available to dynamically access POST data, if you know what type of data you're dealing with. If you have POST UrlEncoded values, you can dynamically using a FormsDataCollection:[HttpPost] public string PostAlbum(FormDataCollection form) { return string.Format("{0} - released {1}", form.Get("AlbumName"),form.Get("RearReleased")); } The FormDataCollection is a very simple object, that essentially provides the same functionality as Request.Form[] in ASP.NET. Request.Form[] still works if you're running hosted in an ASP.NET application. However as a general rule, while ASP.NET's functionality is always available when running Web API hosted inside of an ASP.NET application, using the built in classes specific to Web API makes it possible to run Web API applications in a self hosted environment outside of ASP.NET. If your client is sending JSON to your server, and you don't want to map the JSON to a strongly typed object because you only want to retrieve a few simple values, you can also accept a JObject parameter in your API methods:[HttpPost] public string PostAlbum(JObject jsonData) { dynamic json = jsonData; JObject jalbum = json.Album; JObject juser = json.User; string token = json.UserToken; var album = jalbum.ToObject<Album>(); var user = juser.ToObject<User>(); return String.Format("{0} {1} {2}", album.AlbumName, user.Name, token); } There quite a few options available to you to receive data with Web API, which gives you more choices for the right tool for the job. Unfortunately one shortcoming of Web API is that POST data is always mapped to a single parameter. This means you can't pass multiple POST parameters to methods that receive POST data. It's possible to accept multiple parameters, but only one can map to the POST content - the others have to come from the query string or route values. I have a couple of Blog POSTs that explain what works and what doesn't here: Passing multiple POST parameters to Web API Controller Methods Mapping UrlEncoded POST Values in ASP.NET Web API Handling Delete Operations Finally, to round out the server API code of the album example we've been discussin, here’s the DELETE verb controller method that allows removal of an album by its title:public HttpResponseMessage DeleteAlbum(string title) { var matchedAlbum = AlbumData.Current.Where(alb => alb.AlbumName == title) .SingleOrDefault(); if (matchedAlbum == null) return new HttpResponseMessage(HttpStatusCode.NotFound); AlbumData.Current.Remove(matchedAlbum); return new HttpResponseMessage(HttpStatusCode.NoContent); } To call this action method using jQuery, you can use:$(".removeimage").live("click", function () { var $el = $(this).parent(".album"); var txt = $el.find("a").text(); $.ajax({ url: "albums/" + encodeURIComponent(txt), type: "Delete", success: function (result) { $el.fadeOut().remove(); }, error: jqError }); } Note the use of the DELETE verb in the $.ajax() call, which routes to DeleteAlbum on the server. DELETE is a non-content operation, so you supply a resource ID (the title) via route value or the querystring. Routing Conflicts In all requests with the exception of the AlbumArt image example shown so far, I used HTTP Verb routing that I set up in Listing 1. HTTP Verb Routing is a recommendation that is in line with typical REST access to HTTP resources. However, it takes quite a bit of effort to create REST-compliant API implementations based only on HTTP Verb routing only. You saw one example that didn’t really fit – the return of an image where I created a custom route albums/{title}/image that required creation of a second controller and a custom route to work. HTTP Verb routing to a controller does not mix with custom or action routing to the same controller because of the limited mapping of HTTP verbs imposed by HTTP Verb routing. To understand some of the problems with verb routing, let’s look at another example. Let’s say you create a GetSortableAlbums() method like this and add it to the original AlbumApiController accessed via HTTP Verb routing:[HttpGet] public IQueryable<Album> SortableAlbums() { var albums = AlbumData.Current; // generally should be done only on actual queryable results (EF etc.) // Done here because we're running with a static list but otherwise might be slow return albums.AsQueryable(); } If you compile this code and try to now access the /albums/ link, you get an error: Multiple Actions were found that match the request. HTTP Verb routing only allows access to one GET operation per parameter/route value match. If more than one method exists with the same parameter signature, it doesn’t work. As I mentioned earlier for the image display, the only solution to get this method to work is to throw it into another controller. Because I already set up the AlbumRpcApiController I can add the method there. First, I should rename the method to SortableAlbums() so I’m not using a Get prefix for the method. This also makes the action parameter look cleaner in the URL - it looks less like a method and more like a noun. I can then create a new route that handles direct-action mapping:RouteTable.Routes.MapHttpRoute( name: "AlbumRpcApiAction", routeTemplate: "albums/rpc/{action}/{title}", defaults: new { title = RouteParameter.Optional, controller = "AlbumRpcApi", action = "GetAblums" } ); As I am explicitly adding a route segment – rpc – into the route template, I can now reference explicit methods in the Web API controller using URLs like this: http://localhost/AspNetWebApi/rpc/SortableAlbums Error Handling I’ve already done some minimal error handling in the examples. For example in Listing 6, I detected some known-error scenarios like model validation failing or a resource not being found and returning an appropriate HttpResponseMessage result. But what happens if your code just blows up or causes an exception? If you have a controller method, like this:[HttpGet] public void ThrowException() { throw new UnauthorizedAccessException("Unauthorized Access Sucka"); } You can call it with this: http://localhost/AspNetWebApi/albums/rpc/ThrowException The default exception handling displays a 500-status response with the serialized exception on the local computer only. When you connect from a remote computer, Web API throws back a 500 HTTP Error with no data returned (IIS then adds its HTML error page). The behavior is configurable in the GlobalConfiguration:GlobalConfiguration .Configuration .IncludeErrorDetailPolicy = IncludeErrorDetailPolicy.Never; If you want more control over your error responses sent from code, you can throw explicit error responses yourself using HttpResponseException. When you throw an HttpResponseException the response parameter is used to generate the output for the Controller action. [HttpGet] public void ThrowError() { var resp = Request.CreateResponse<ApiMessageError>( HttpStatusCode.BadRequest, new ApiMessageError("Your code stinks!")); throw new HttpResponseException(resp); } Throwing an HttpResponseException stops the processing of the controller method and immediately returns the response you passed to the exception. Unlike other Exceptions fired inside of WebAPI, HttpResponseException bypasses the Exception Filters installed and instead just outputs the response you provide. In this case, the serialized ApiMessageError result string is returned in the default serialization format – XML or JSON. You can pass any content to HttpResponseMessage, which includes creating your own exception objects and consistently returning error messages to the client. Here’s a small helper method on the controller that you might use to send exception info back to the client consistently:private void ThrowSafeException(string message, HttpStatusCode statusCode = HttpStatusCode.BadRequest) { var errResponse = Request.CreateResponse<ApiMessageError>(statusCode, new ApiMessageError() { message = message }); throw new HttpResponseException(errResponse); } You can then use it to output any captured errors from code:[HttpGet] public void ThrowErrorSafe() { try { List<string> list = null; list.Add("Rick"); } catch (Exception ex) { ThrowSafeException(ex.Message); } } Exception Filters Another more global solution is to create an Exception Filter. Filters in Web API provide the ability to pre- and post-process controller method operations. An exception filter looks at all exceptions fired and then optionally creates an HttpResponseMessage result. Listing 8 shows an example of a basic Exception filter implementation.public class UnhandledExceptionFilter : ExceptionFilterAttribute { public override void OnException(HttpActionExecutedContext context) { HttpStatusCode status = HttpStatusCode.InternalServerError; var exType = context.Exception.GetType(); if (exType == typeof(UnauthorizedAccessException)) status = HttpStatusCode.Unauthorized; else if (exType == typeof(ArgumentException)) status = HttpStatusCode.NotFound; var apiError = new ApiMessageError() { message = context.Exception.Message }; // create a new response and attach our ApiError object // which now gets returned on ANY exception result var errorResponse = context.Request.CreateResponse<ApiMessageError>(status, apiError); context.Response = errorResponse; base.OnException(context); } } Exception Filter Attributes can be assigned to an ApiController class like this:[UnhandledExceptionFilter] public class AlbumRpcApiController : ApiController or you can globally assign it to all controllers by adding it to the HTTP Configuration's Filters collection:GlobalConfiguration.Configuration.Filters.Add(new UnhandledExceptionFilter()); The latter is a great way to get global error trapping so that all errors (short of hard IIS errors and explicit HttpResponseException errors) return a valid error response that includes error information in the form of a known-error object. Using a filter like this allows you to throw an exception as you normally would and have your filter create a response in the appropriate output format that the client expects. For example, an AJAX application can on failure expect to see a JSON error result that corresponds to the real error that occurred rather than a 500 error along with HTML error page that IIS throws up. You can even create some custom exceptions so you can differentiate your own exceptions from unhandled system exceptions - you often don't want to display error information from 'unknown' exceptions as they may contain sensitive system information or info that's not generally useful to users of your application/site. This is just one example of how ASP.NET Web API is configurable and extensible. Exception filters are just one example of how you can plug-in into the Web API request flow to modify output. Many more hooks exist and I’ll take a closer look at extensibility in Part 2 of this article in the future. Summary Web API is a big improvement over previous Microsoft REST and AJAX toolkits. The key features to its usefulness are its ease of use with simple controller based logic, familiar MVC-style routing, low configuration impact, extensibility at all levels and tight attention to exposing and making HTTP semantics easily discoverable and easy to use. Although none of the concepts used in Web API are new or radical, Web API combines the best of previous platforms into a single framework that’s highly functional, easy to work with, and extensible to boot. I think that Microsoft has hit a home run with Web API. Related Resources Where does ASP.NET Web API fit? Sample Source Code on GitHub Passing multiple POST parameters to Web API Controller Methods Mapping UrlEncoded POST Values in ASP.NET Web API Creating a JSONP Formatter for ASP.NET Web API Removing the XML Formatter from ASP.NET Web API Applications© Rick Strahl, West Wind Technologies, 2005-2012Posted in Web Api Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article

Optimized OCR black/white pixel algorithm

- by eagle

I am writing a simple OCR solution for a finite set of characters. That is, I know the exact way all 26 letters in the alphabet will look like. I am using C# and am able to easily determine if a given pixel should be treated as black or white. I am generating a matrix of black/white pixels for every single character. So for example, the letter I (capital i), might look like the following: 01110 00100 00100 00100 01110 Note: all points, which I use later in this post, assume that the top left pixel is (0, 0), bottom right pixel is (4, 4). 1's represent black pixels, and 0's represent white pixels. I would create a corresponding matrix in C# like this: CreateLetter("I", new List<List<bool>>() { new List<bool>() { false, true, true, true, false }, new List<bool>() { false, false, true, false, false }, new List<bool>() { false, false, true, false, false }, new List<bool>() { false, false, true, false, false }, new List<bool>() { false, true, true, true, false } }); I know I could probably optimize this part by using a multi-dimensional array instead, but let's ignore that for now, this is for illustrative purposes. Every letter is exactly the same dimensions, 10px by 11px (10px by 11px is the actual dimensions of a character in my real program. I simplified this to 5px by 5px in this posting since it is much easier to "draw" the letters using 0's and 1's on a smaller image). Now when I give it a 10px by 11px part of an image to analyze with OCR, it would need to run on every single letter (26) on every single pixel (10 * 11 = 110) which would mean 2,860 (26 * 110) iterations (in the worst case) for every single character. I was thinking this could be optimized by defining the unique characteristics of every character. So, for example, let's assume that the set of characters only consists of 5 distinct letters: I, A, O, B, and L. These might look like the following: 01110 00100 00100 01100 01000 00100 01010 01010 01010 01000 00100 01110 01010 01100 01000 00100 01010 01010 01010 01000 01110 01010 00100 01100 01110 After analyzing the unique characteristics of every character, I can significantly reduce the number of tests that need to be performed to test for a character. For example, for the "I" character, I could define it's unique characteristics as having a black pixel in the coordinate (3, 0) since no other characters have that pixel as black. So instead of testing 110 pixels for a match on the "I" character, I reduced it to a 1 pixel test. This is what it might look like for all these characters: var LetterI = new OcrLetter() { Name = "I", BlackPixels = new List<Point>() { new Point (3, 0) } } var LetterA = new OcrLetter() { Name = "A", WhitePixels = new List<Point>() { new Point(2, 4) } } var LetterO = new OcrLetter() { Name = "O", BlackPixels = new List<Point>() { new Point(3, 2) }, WhitePixels = new List<Point>() { new Point(2, 2) } } var LetterB = new OcrLetter() { Name = "B", BlackPixels = new List<Point>() { new Point(3, 1) }, WhitePixels = new List<Point>() { new Point(3, 2) } } var LetterL = new OcrLetter() { Name = "L", BlackPixels = new List<Point>() { new Point(1, 1), new Point(3, 4) }, WhitePixels = new List<Point>() { new Point(2, 2) } } This is challenging to do manually for 5 characters and gets much harder the greater the amount of letters that are added. You also want to guarantee that you have the minimum set of unique characteristics of a letter since you want it to be optimized as much as possible. I want to create an algorithm that will identify the unique characteristics of all the letters and would generate similar code to that above. I would then use this optimized black/white matrix to identify characters. How do I take the 26 letters that have all their black/white pixels filled in (e.g. the CreateLetter code block) and convert them to an optimized set of unique characteristics that define a letter (e.g. the new OcrLetter() code block)? And how would I guarantee that it is the most efficient definition set of unique characteristics (e.g. instead of defining 6 points as the unique characteristics, there might be a way to do it with 1 or 2 points, as the letter "I" in my example was able to). An alternative solution I've come up with is using a hash table, which will reduce it from 2,860 iterations to 110 iterations, a 26 time reduction. This is how it might work: I would populate it with data similar to the following: Letters["01110 00100 00100 00100 01110"] = "I"; Letters["00100 01010 01110 01010 01010"] = "A"; Letters["00100 01010 01010 01010 00100"] = "O"; Letters["01100 01010 01100 01010 01100"] = "B"; Now when I reach a location in the image to process, I convert it to a string such as: "01110 00100 00100 00100 01110" and simply find it in the hash table. This solution seems very simple, however, this still requires 110 iterations to generate this string for each letter. In big O notation, the algorithm is the same since O(110N) = O(2860N) = O(N) for N letters to process on the page. However, it is still improved by a constant factor of 26, a significant improvement (e.g. instead of it taking 26 minutes, it would take 1 minute). Update: Most of the solutions provided so far have not addressed the issue of identifying the unique characteristics of a character and rather provide alternative solutions. I am still looking for this solution which, as far as I can tell, is the only way to achieve the fastest OCR processing. I just came up with a partial solution: For each pixel, in the grid, store the letters that have it as a black pixel. Using these letters: I A O B L 01110 00100 00100 01100 01000 00100 01010 01010 01010 01000 00100 01110 01010 01100 01000 00100 01010 01010 01010 01000 01110 01010 00100 01100 01110 You would have something like this: CreatePixel(new Point(0, 0), new List<Char>() { }); CreatePixel(new Point(1, 0), new List<Char>() { 'I', 'B', 'L' }); CreatePixel(new Point(2, 0), new List<Char>() { 'I', 'A', 'O', 'B' }); CreatePixel(new Point(3, 0), new List<Char>() { 'I' }); CreatePixel(new Point(4, 0), new List<Char>() { }); CreatePixel(new Point(0, 1), new List<Char>() { }); CreatePixel(new Point(1, 1), new List<Char>() { 'A', 'B', 'L' }); CreatePixel(new Point(2, 1), new List<Char>() { 'I' }); CreatePixel(new Point(3, 1), new List<Char>() { 'A', 'O', 'B' }); // ... CreatePixel(new Point(2, 2), new List<Char>() { 'I', 'A', 'B' }); CreatePixel(new Point(3, 2), new List<Char>() { 'A', 'O' }); // ... CreatePixel(new Point(2, 4), new List<Char>() { 'I', 'O', 'B', 'L' }); CreatePixel(new Point(3, 4), new List<Char>() { 'I', 'A', 'L' }); CreatePixel(new Point(4, 4), new List<Char>() { }); Now for every letter, in order to find the unique characteristics, you need to look at which buckets it belongs to, as well as the amount of other characters in the bucket. So let's take the example of "I". We go to all the buckets it belongs to (1,0; 2,0; 3,0; ...; 3,4) and see that the one with the least amount of other characters is (3,0). In fact, it only has 1 character, meaning it must be an "I" in this case, and we found our unique characteristic. You can also do the same for pixels that would be white. Notice that bucket (2,0) contains all the letters except for "L", this means that it could be used as a white pixel test. Similarly, (2,4) doesn't contain an 'A'. Buckets that either contain all the letters or none of the letters can be discarded immediately, since these pixels can't help define a unique characteristic (e.g. 1,1; 4,0; 0,1; 4,4). It gets trickier when you don't have a 1 pixel test for a letter, for example in the case of 'O' and 'B'. Let's walk through the test for 'O'... It's contained in the following buckets: // Bucket Count Letters // 2,0 4 I, A, O, B // 3,1 3 A, O, B // 3,2 2 A, O // 2,4 4 I, O, B, L Additionally, we also have a few white pixel tests that can help: (I only listed those that are missing at most 2). The Missing Count was calculated as (5 - Bucket.Count). // Bucket Missing Count Missing Letters // 1,0 2 A, O // 1,1 2 I, O // 2,2 2 O, L // 3,4 2 O, B So now we can take the shortest black pixel bucket (3,2) and see that when we test for (3,2) we know it is either an 'A' or an 'O'. So we need an easy way to tell the difference between an 'A' and an 'O'. We could either look for a black pixel bucket that contains 'O' but not 'A' (e.g. 2,4) or a white pixel bucket that contains an 'O' but not an 'A' (e.g. 1,1). Either of these could be used in combination with the (3,2) pixel to uniquely identify the letter 'O' with only 2 tests. This seems like a simple algorithm when there are 5 characters, but how would I do this when there are 26 letters and a lot more pixels overlapping? For example, let's say that after the (3,2) pixel test, it found 10 different characters that contain the pixel (and this was the least from all the buckets). Now I need to find differences from 9 other characters instead of only 1 other character. How would I achieve my goal of getting the least amount of checks as possible, and ensure that I am not running extraneous tests?

Read the article

Zen and the Art of File and Folder Organization

- by Mark Virtue

Is your desk a paragon of neatness, or does it look like a paper-bomb has gone off? If you’ve been putting off getting organized because the task is too huge or daunting, or you don’t know where to start, we’ve got 40 tips to get you on the path to zen mastery of your filing system. For all those readers who would like to get their files and folders organized, or, if they’re already organized, better organized—we have compiled a complete guide to getting organized and staying organized, a comprehensive article that will hopefully cover every possible tip you could want. Signs that Your Computer is Poorly Organized If your computer is a mess, you’re probably already aware of it. But just in case you’re not, here are some tell-tale signs: Your Desktop has over 40 icons on it “My Documents” contains over 300 files and 60 folders, including MP3s and digital photos You use the Windows’ built-in search facility whenever you need to find a file You can’t find programs in the out-of-control list of programs in your Start Menu You save all your Word documents in one folder, all your spreadsheets in a second folder, etc Any given file that you’re looking for may be in any one of four different sets of folders But before we start, here are some quick notes: We’re going to assume you know what files and folders are, and how to create, save, rename, copy and delete them The organization principles described in this article apply equally to all computer systems. However, the screenshots here will reflect how things look on Windows (usually Windows 7). We will also mention some useful features of Windows that can help you get organized. Everyone has their own favorite methodology of organizing and filing, and it’s all too easy to get into “My Way is Better than Your Way” arguments. The reality is that there is no perfect way of getting things organized. When I wrote this article, I tried to keep a generalist and objective viewpoint. I consider myself to be unusually well organized (to the point of obsession, truth be told), and I’ve had 25 years experience in collecting and organizing files on computers. So I’ve got a lot to say on the subject. But the tips I have described here are only one way of doing it. Hopefully some of these tips will work for you too, but please don’t read this as any sort of “right” way to do it. At the end of the article we’ll be asking you, the reader, for your own organization tips. Why Bother Organizing At All? For some, the answer to this question is self-evident. And yet, in this era of powerful desktop search software (the search capabilities built into the Windows Vista and Windows 7 Start Menus, and third-party programs like Google Desktop Search), the question does need to be asked, and answered. I have a friend who puts every file he ever creates, receives or downloads into his My Documents folder and doesn’t bother filing them into subfolders at all. He relies on the search functionality built into his Windows operating system to help him find whatever he’s looking for. And he always finds it. He’s a Search Samurai. For him, filing is a waste of valuable time that could be spent enjoying life! It’s tempting to follow suit. On the face of it, why would anyone bother to take the time to organize their hard disk when such excellent search software is available? Well, if all you ever want to do with the files you own is to locate and open them individually (for listening, editing, etc), then there’s no reason to ever bother doing one scrap of organization. But consider these common tasks that are not achievable with desktop search software: Find files manually. Often it’s not convenient, speedy or even possible to utilize your desktop search software to find what you want. It doesn’t work 100% of the time, or you may not even have it installed. Sometimes its just plain faster to go straight to the file you want, if you know it’s in a particular sub-folder, rather than trawling through hundreds of search results. Find groups of similar files (e.g. all your “work” files, all the photos of your Europe holiday in 2008, all your music videos, all the MP3s from Dark Side of the Moon, all your letters you wrote to your wife, all your tax returns). Clever naming of the files will only get you so far. Sometimes it’s the date the file was created that’s important, other times it’s the file format, and other times it’s the purpose of the file. How do you name a collection of files so that they’re easy to isolate based on any of the above criteria? Short answer, you can’t. Move files to a new computer. It’s time to upgrade your computer. How do you quickly grab all the files that are important to you? Or you decide to have two computers now – one for home and one for work. How do you quickly isolate only the work-related files to move them to the work computer? Synchronize files to other computers. If you have more than one computer, and you need to mirror some of your files onto the other computer (e.g. your music collection), then you need a way to quickly determine which files are to be synced and which are not. Surely you don’t want to synchronize everything? Choose which files to back up. If your backup regime calls for multiple backups, or requires speedy backups, then you’ll need to be able to specify which files are to be backed up, and which are not. This is not possible if they’re all in the same folder. Finally, if you’re simply someone who takes pleasure in being organized, tidy and ordered (me! me!), then you don’t even need a reason. Being disorganized is simply unthinkable. Tips on Getting Organized Here we present our 40 best tips on how to get organized. Or, if you’re already organized, to get better organized. Tip #1. Choose Your Organization System Carefully The reason that most people are not organized is that it takes time. And the first thing that takes time is deciding upon a system of organization. This is always a matter of personal preference, and is not something that a geek on a website can tell you. You should always choose your own system, based on how your own brain is organized (which makes the assumption that your brain is, in fact, organized). We can’t instruct you, but we can make suggestions: You may want to start off with a system based on the users of the computer. i.e. “My Files”, “My Wife’s Files”, My Son’s Files”, etc. Inside “My Files”, you might then break it down into “Personal” and “Business”. You may then realize that there are overlaps. For example, everyone may want to share access to the music library, or the photos from the school play. So you may create another folder called “Family”, for the “common” files. You may decide that the highest-level breakdown of your files is based on the “source” of each file. In other words, who created the files. You could have “Files created by ME (business or personal)”, “Files created by people I know (family, friends, etc)”, and finally “Files created by the rest of the world (MP3 music files, downloaded or ripped movies or TV shows, software installation files, gorgeous desktop wallpaper images you’ve collected, etc).” This system happens to be the one I use myself. See below: Mark is for files created by meVC is for files created by my company (Virtual Creations)Others is for files created by my friends and familyData is the rest of the worldAlso, Settings is where I store the configuration files and other program data files for my installed software (more on this in tip #34, below). Each folder will present its own particular set of requirements for further sub-organization. For example, you may decide to organize your music collection into sub-folders based on the artist’s name, while your digital photos might get organized based on the date they were taken. It can be different for every sub-folder! Another strategy would be based on “currentness”. Files you have yet to open and look at live in one folder. Ones that have been looked at but not yet filed live in another place. Current, active projects live in yet another place. All other files (your “archive”, if you like) would live in a fourth folder. (And of course, within that last folder you’d need to create a further sub-system based on one of the previous bullet points). Put some thought into this – changing it when it proves incomplete can be a big hassle! Before you go to the trouble of implementing any system you come up with, examine a wide cross-section of the files you own and see if they will all be able to find a nice logical place to sit within your system. Tip #2. When You Decide on Your System, Stick to It! There’s nothing more pointless than going to all the trouble of creating a system and filing all your files, and then whenever you create, receive or download a new file, you simply dump it onto your Desktop. You need to be disciplined – forever! Every new file you get, spend those extra few seconds to file it where it belongs! Otherwise, in just a month or two, you’ll be worse off than before – half your files will be organized and half will be disorganized – and you won’t know which is which! Tip #3. Choose the Root Folder of Your Structure Carefully Every data file (document, photo, music file, etc) that you create, own or is important to you, no matter where it came from, should be found within one single folder, and that one single folder should be located at the root of your C: drive (as a sub-folder of C:\). In other words, do not base your folder structure in standard folders like “My Documents”. If you do, then you’re leaving it up to the operating system engineers to decide what folder structure is best for you. And every operating system has a different system! In Windows 7 your files are found in C:\Users\YourName, whilst on Windows XP it was C:\Documents and Settings\YourName\My Documents. In UNIX systems it’s often /home/YourName. These standard default folders tend to fill up with junk files and folders that are not at all important to you. “My Documents” is the worst offender. Every second piece of software you install, it seems, likes to create its own folder in the “My Documents” folder. These folders usually don’t fit within your organizational structure, so don’t use them! In fact, don’t even use the “My Documents” folder at all. Allow it to fill up with junk, and then simply ignore it. It sounds heretical, but: Don’t ever visit your “My Documents” folder! Remove your icons/links to “My Documents” and replace them with links to the folders you created and you care about! Create your own file system from scratch! Probably the best place to put it would be on your D: drive – if you have one. This way, all your files live on one drive, while all the operating system and software component files live on the C: drive – simply and elegantly separated. The benefits of that are profound. Not only are there obvious organizational benefits (see tip #10, below), but when it comes to migrate your data to a new computer, you can (sometimes) simply unplug your D: drive and plug it in as the D: drive of your new computer (this implies that the D: drive is actually a separate physical disk, and not a partition on the same disk as C:). You also get a slight speed improvement (again, only if your C: and D: drives are on separate physical disks). Warning: From tip #12, below, you will see that it’s actually a good idea to have exactly the same file system structure – including the drive it’s filed on – on all of the computers you own. So if you decide to use the D: drive as the storage system for your own files, make sure you are able to use the D: drive on all the computers you own. If you can’t ensure that, then you can still use a clever geeky trick to store your files on the D: drive, but still access them all via the C: drive (see tip #17, below). If you only have one hard disk (C:), then create a dedicated folder that will contain all your files – something like C:\Files. The name of the folder is not important, but make it a single, brief word. There are several reasons for this: When creating a backup regime, it’s easy to decide what files should be backed up – they’re all in the one folder! If you ever decide to trade in your computer for a new one, you know exactly which files to migrate You will always know where to begin a search for any file If you synchronize files with other computers, it makes your synchronization routines very simple. It also causes all your shortcuts to continue to work on the other machines (more about this in tip #24, below). Once you’ve decided where your files should go, then put all your files in there – Everything! Completely disregard the standard, default folders that are created for you by the operating system (“My Music”, “My Pictures”, etc). In fact, you can actually relocate many of those folders into your own structure (more about that below, in tip #6). The more completely you get all your data files (documents, photos, music, etc) and all your configuration settings into that one folder, then the easier it will be to perform all of the above tasks. Once this has been done, and all your files live in one folder, all the other folders in C:\ can be thought of as “operating system” folders, and therefore of little day-to-day interest for us. Here’s a screenshot of a nicely organized C: drive, where all user files are located within the \Files folder: Tip #4. Use Sub-Folders This would be our simplest and most obvious tip. It almost goes without saying. Any organizational system you decide upon (see tip #1) will require that you create sub-folders for your files. Get used to creating folders on a regular basis. Tip #5. Don’t be Shy About Depth Create as many levels of sub-folders as you need. Don’t be scared to do so. Every time you notice an opportunity to group a set of related files into a sub-folder, do so. Examples might include: All the MP3s from one music CD, all the photos from one holiday, or all the documents from one client. It’s perfectly okay to put files into a folder called C:\Files\Me\From Others\Services\WestCo Bank\Statements\2009. That’s only seven levels deep. Ten levels is not uncommon. Of course, it’s possible to take this too far. If you notice yourself creating a sub-folder to hold only one file, then you’ve probably become a little over-zealous. On the other hand, if you simply create a structure with only two levels (for example C:\Files\Work) then you really haven’t achieved any level of organization at all (unless you own only six files!). Your “Work” folder will have become a dumping ground, just like your Desktop was, with most likely hundreds of files in it. Tip #6. Move the Standard User Folders into Your Own Folder Structure Most operating systems, including Windows, create a set of standard folders for each of its users. These folders then become the default location for files such as documents, music files, digital photos and downloaded Internet files. In Windows 7, the full list is shown below: Some of these folders you may never use nor care about (for example, the Favorites folder, if you’re not using Internet Explorer as your browser). Those ones you can leave where they are. But you may be using some of the other folders to store files that are important to you. Even if you’re not using them, Windows will still often treat them as the default storage location for many types of files. When you go to save a standard file type, it can become annoying to be automatically prompted to save it in a folder that’s not part of your own file structure. But there’s a simple solution: Move the folders you care about into your own folder structure! If you do, then the next time you go to save a file of the corresponding type, Windows will prompt you to save it in the new, moved location. Moving the folders is easy. Simply drag-and-drop them to the new location. Here’s a screenshot of the default My Music folder being moved to my custom personal folder (Mark): Tip #7. Name Files and Folders Intelligently This is another one that almost goes without saying, but we’ll say it anyway: Do not allow files to be created that have meaningless names like Document1.doc, or folders called New Folder (2). Take that extra 20 seconds and come up with a meaningful name for the file/folder – one that accurately divulges its contents without repeating the entire contents in the name. Tip #8. Watch Out for Long Filenames Another way to tell if you have not yet created enough depth to your folder hierarchy is that your files often require really long names. If you need to call a file Johnson Sales Figures March 2009.xls (which might happen to live in the same folder as Abercrombie Budget Report 2008.xls), then you might want to create some sub-folders so that the first file could be simply called March.xls, and living in the Clients\Johnson\Sales Figures\2009 folder. A well-placed file needs only a brief filename! Tip #9. Use Shortcuts! Everywhere! This is probably the single most useful and important tip we can offer. A shortcut allows a file to be in two places at once. Why would you want that? Well, the file and folder structure of every popular operating system on the market today is hierarchical. This means that all objects (files and folders) always live within exactly one parent folder. It’s a bit like a tree. A tree has branches (folders) and leaves (files). Each leaf, and each branch, is supported by exactly one parent branch, all the way back to the root of the tree (which, incidentally, is exactly why C:\ is called the “root folder” of the C: drive). That hard disks are structured this way may seem obvious and even necessary, but it’s only one way of organizing data. There are others: Relational databases, for example, organize structured data entirely differently. The main limitation of hierarchical filing structures is that a file can only ever be in one branch of the tree – in only one folder – at a time. Why is this a problem? Well, there are two main reasons why this limitation is a problem for computer users: The “correct” place for a file, according to our organizational rationale, is very often a very inconvenient place for that file to be located. Just because it’s correctly filed doesn’t mean it’s easy to get to. Your file may be “correctly” buried six levels deep in your sub-folder structure, but you may need regular and speedy access to this file every day. You could always move it to a more convenient location, but that would mean that you would need to re-file back to its “correct” location it every time you’d finished working on it. Most unsatisfactory. A file may simply “belong” in two or more different locations within your file structure. For example, say you’re an accountant and you have just completed the 2009 tax return for John Smith. It might make sense to you to call this file 2009 Tax Return.doc and file it under Clients\John Smith. But it may also be important to you to have the 2009 tax returns from all your clients together in the one place. So you might also want to call the file John Smith.doc and file it under Tax Returns\2009. The problem is, in a purely hierarchical filing system, you can’t put it in both places. Grrrrr! Fortunately, Windows (and most other operating systems) offers a way for you to do exactly that: It’s called a “shortcut” (also known as an “alias” on Macs and a “symbolic link” on UNIX systems). Shortcuts allow a file to exist in one place, and an icon that represents the file to be created and put anywhere else you please. In fact, you can create a dozen such icons and scatter them all over your hard disk. Double-clicking on one of these icons/shortcuts opens up the original file, just as if you had double-clicked on the original file itself. Consider the following two icons: The one on the left is the actual Word document, while the one on the right is a shortcut that represents the Word document. Double-clicking on either icon will open the same file. There are two main visual differences between the icons: The shortcut will have a small arrow in the lower-left-hand corner (on Windows, anyway) The shortcut is allowed to have a name that does not include the file extension (the “.docx” part, in this case) You can delete the shortcut at any time without losing any actual data. The original is still intact. All you lose is the ability to get to that data from wherever the shortcut was. So why are shortcuts so great? Because they allow us to easily overcome the main limitation of hierarchical file systems, and put a file in two (or more) places at the same time. You will always have files that don’t play nice with your organizational rationale, and can’t be filed in only one place. They demand to exist in two places. Shortcuts allow this! Furthermore, they allow you to collect your most often-opened files and folders together in one spot for convenient access. The cool part is that the original files stay where they are, safe forever in their perfectly organized location. So your collection of most often-opened files can – and should – become a collection of shortcuts! If you’re still not convinced of the utility of shortcuts, consider the following well-known areas of a typical Windows computer: The Start Menu (and all the programs that live within it) The Quick Launch bar (or the Superbar in Windows 7) The “Favorite folders” area in the top-left corner of the Windows Explorer window (in Windows Vista or Windows 7) Your Internet Explorer Favorites or Firefox Bookmarks Each item in each of these areas is a shortcut! Each of those areas exist for one purpose only: For convenience – to provide you with a collection of the files and folders you access most often. It should be easy to see by now that shortcuts are designed for one single purpose: To make accessing your files more convenient. Each time you double-click on a shortcut, you are saved the hassle of locating the file (or folder, or program, or drive, or control panel icon) that it represents. Shortcuts allow us to invent a golden rule of file and folder organization: “Only ever have one copy of a file – never have two copies of the same file. Use a shortcut instead” (this rule doesn’t apply to copies created for backup purposes, of course!) There are also lesser rules, like “don’t move a file into your work area – create a shortcut there instead”, and “any time you find yourself frustrated with how long it takes to locate a file, create a shortcut to it and place that shortcut in a convenient location.” So how to we create these massively useful shortcuts? There are two main ways: “Copy” the original file or folder (click on it and type Ctrl-C, or right-click on it and select Copy): Then right-click in an empty area of the destination folder (the place where you want the shortcut to go) and select Paste shortcut: Right-drag (drag with the right mouse button) the file from the source folder to the destination folder. When you let go of the mouse button at the destination folder, a menu pops up: Select Create shortcuts here. Note that when shortcuts are created, they are often named something like Shortcut to Budget Detail.doc (windows XP) or Budget Detail – Shortcut.doc (Windows 7). If you don’t like those extra words, you can easily rename the shortcuts after they’re created, or you can configure Windows to never insert the extra words in the first place (see our article on how to do this). And of course, you can create shortcuts to folders too, not just to files! Bottom line: Whenever you have a file that you’d like to access from somewhere else (whether it’s convenience you’re after, or because the file simply belongs in two places), create a shortcut to the original file in the new location. Tip #10. Separate Application Files from Data Files Any digital organization guru will drum this rule into you. Application files are the components of the software you’ve installed (e.g. Microsoft Word, Adobe Photoshop or Internet Explorer). Data files are the files that you’ve created for yourself using that software (e.g. Word Documents, digital photos, emails or playlists). Software gets installed, uninstalled and upgraded all the time. Hopefully you always have the original installation media (or downloaded set-up file) kept somewhere safe, and can thus reinstall your software at any time. This means that the software component files are of little importance. Whereas the files you have created with that software is, by definition, important. It’s a good rule to always separate unimportant files from important files. So when your software prompts you to save a file you’ve just created, take a moment and check out where it’s suggesting that you save the file. If it’s suggesting that you save the file into the same folder as the software itself, then definitely don’t follow that suggestion. File it in your own folder! In fact, see if you can find the program’s configuration option that determines where files are saved by default (if it has one), and change it. Tip #11. Organize Files Based on Purpose, Not on File Type If you have, for example a folder called Work\Clients\Johnson, and within that folder you have two sub-folders, Word Documents and Spreadsheets (in other words, you’re separating “.doc” files from “.xls” files), then chances are that you’re not optimally organized. It makes little sense to organize your files based on the program that created them. Instead, create your sub-folders based on the purpose of the file. For example, it would make more sense to create sub-folders called Correspondence and Financials. It may well be that all the files in a given sub-folder are of the same file-type, but this should be more of a coincidence and less of a design feature of your organization system. Tip #12. Maintain the Same Folder Structure on All Your Computers In other words, whatever organizational system you create, apply it to every computer that you can. There are several benefits to this: There’s less to remember. No matter where you are, you always know where to look for your files If you copy or synchronize files from one computer to another, then setting up the synchronization job becomes very simple Shortcuts can be copied or moved from one computer to another with ease (assuming the original files are also copied/moved). There’s no need to find the target of the shortcut all over again on the second computer Ditto for linked files (e.g Word documents that link to data in a separate Excel file), playlists, and any files that reference the exact file locations of other files. This applies even to the drive that your files are stored on. If your files are stored on C: on one computer, make sure they’re stored on C: on all your computers. Otherwise all your shortcuts, playlists and linked files will stop working! Tip #13. Create an “Inbox” Folder Create yourself a folder where you store all files that you’re currently working on, or that you haven’t gotten around to filing yet. You can think of this folder as your “to-do” list. You can call it “Inbox” (making it the same metaphor as your email system), or “Work”, or “To-Do”, or “Scratch”, or whatever name makes sense to you. It doesn’t matter what you call it – just make sure you have one! Once you have finished working on a file, you then move it from the “Inbox” to its correct location within your organizational structure. You may want to use your Desktop as this “Inbox” folder. Rightly or wrongly, most people do. It’s not a bad place to put such files, but be careful: If you do decide that your Desktop represents your “to-do” list, then make sure that no other files find their way there. In other words, make sure that your “Inbox”, wherever it is, Desktop or otherwise, is kept free of junk – stray files that don’t belong there. So where should you put this folder, which, almost by definition, lives outside the structure of the rest of your filing system? Well, first and foremost, it has to be somewhere handy. This will be one of your most-visited folders, so convenience is key. Putting it on the Desktop is a great option – especially if you don’t have any other folders on your Desktop: the folder then becomes supremely easy to find in Windows Explorer: You would then create shortcuts to this folder in convenient spots all over your computer (“Favorite Links”, “Quick Launch”, etc). Tip #14. Ensure You have Only One “Inbox” Folder Once you’ve created your “Inbox” folder, don’t use any other folder location as your “to-do list”. Throw every incoming or created file into the Inbox folder as you create/receive it. This keeps the rest of your computer pristine and free of randomly created or downloaded junk. The last thing you want to be doing is checking multiple folders to see all your current tasks and projects. Gather them all together into one folder. Here are some tips to help ensure you only have one Inbox: Set the default “save” location of all your programs to this folder. Set the default “download” location for your browser to this folder. If this folder is not your desktop (recommended) then also see if you can make a point of not putting “to-do” files on your desktop. This keeps your desktop uncluttered and Zen-like: (the Inbox folder is in the bottom-right corner) Tip #15. Be Vigilant about Clearing Your “Inbox” Folder This is one of the keys to staying organized. If you let your “Inbox” overflow (i.e. allow there to be more than, say, 30 files or folders in there), then you’re probably going to start feeling like you’re overwhelmed: You’re not keeping up with your to-do list. Once your Inbox gets beyond a certain point (around 30 files, studies have shown), then you’ll simply start to avoid it. You may continue to put files in there, but you’ll be scared to look at it, fearing the “out of control” feeling that all overworked, chaotic or just plain disorganized people regularly feel. So, here’s what you can do: Visit your Inbox/to-do folder regularly (at least five times per day). Scan the folder regularly for files that you have completed working on and are ready for filing. File them immediately. Make it a source of pride to keep the number of files in this folder as small as possible. If you value peace of mind, then make the emptiness of this folder one of your highest (computer) priorities If you know that a particular file has been in the folder for more than, say, six weeks, then admit that you’re not actually going to get around to processing it, and move it to its final resting place. Tip #16. File Everything Immediately, and Use Shortcuts for Your Active Projects As soon as you create, receive or download a new file, store it away in its “correct” folder immediately. Then, whenever you need to work on it (possibly straight away), create a shortcut to it in your “Inbox” (“to-do”) folder or your desktop. That way, all your files are always in their “correct” locations, yet you still have immediate, convenient access to your current, active files. When you finish working on a file, simply delete the shortcut. Ideally, your “Inbox” folder – and your Desktop – should contain no actual files or folders. They should simply contain shortcuts. Tip #17. Use Directory Symbolic Links (or Junctions) to Maintain One Unified Folder Structure Using this tip, we can get around a potential hiccup that we can run into when creating our organizational structure – the issue of having more than one drive on our computer (C:, D:, etc). We might have files we need to store on the D: drive for space reasons, and yet want to base our organized folder structure on the C: drive (or vice-versa). Your chosen organizational structure may dictate that all your files must be accessed from the C: drive (for example, the root folder of all your files may be something like C:\Files). And yet you may still have a D: drive and wish to take advantage of the hundreds of spare Gigabytes that it offers. Did you know that it’s actually possible to store your files on the D: drive and yet access them as if they were on the C: drive? And no, we’re not talking about shortcuts here (although the concept is very similar). By using the shell command mklink, you can essentially take a folder that lives on one drive and create an alias for it on a different drive (you can do lots more than that with mklink – for a full rundown on this programs capabilities, see our dedicated article). These aliases are called directory symbolic links (and used to be known as junctions). You can think of them as “virtual” folders. They function exactly like regular folders, except they’re physically located somewhere else. For example, you may decide that your entire D: drive contains your complete organizational file structure, but that you need to reference all those files as if they were on the C: drive, under C:\Files. If that was the case you could create C:\Files as a directory symbolic link – a link to D:, as follows: mklink /d c:\files d:\ Or it may be that the only files you wish to store on the D: drive are your movie collection. You could locate all your movie files in the root of your D: drive, and then link it to C:\Files\Media\Movies, as follows: mklink /d c:\files\media\movies d:\ (Needless to say, you must run these commands from a command prompt – click the Start button, type cmd and press Enter) Tip #18. Customize Your Folder Icons This is not strictly speaking an organizational tip, but having unique icons for each folder does allow you to more quickly visually identify which folder is which, and thus saves you time when you’re finding files. An example is below (from my folder that contains all files downloaded from the Internet): To learn how to change your folder icons, please refer to our dedicated article on the subject. Tip #19. Tidy Your Start Menu The Windows Start Menu is usually one of the messiest parts of any Windows computer. Every program you install seems to adopt a completely different approach to placing icons in this menu. Some simply put a single program icon. Others create a folder based on the name of the software. And others create a folder based on the name of the software manufacturer. It’s chaos, and can make it hard to find the software you want to run. Thankfully we can avoid this chaos with useful operating system features like Quick Launch, the Superbar or pinned start menu items. Even so, it would make a lot of sense to get into the guts of the Start Menu itself and give it a good once-over. All you really need to decide is how you’re going to organize your applications. A structure based on the purpose of the application is an obvious candidate. Below is an example of one such structure: In this structure, Utilities means software whose job it is to keep the computer itself running smoothly (configuration tools, backup software, Zip programs, etc). Applications refers to any productivity software that doesn’t fit under the headings Multimedia, Graphics, Internet, etc. In case you’re not aware, every icon in your Start Menu is a shortcut and can be manipulated like any other shortcut (copied, moved, deleted, etc). With the Windows Start Menu (all version of Windows), Microsoft has decided that there be two parallel folder structures to store your Start Menu shortcuts. One for you (the logged-in user of the computer) and one for all users of the computer. Having two parallel structures can often be redundant: If you are the only user of the computer, then having two parallel structures is totally redundant. Even if you have several users that regularly log into the computer, most of your installed software will need to be made available to all users, and should thus be moved out of the “just you” version of the Start Menu and into the “all users” area. To take control of your Start Menu, so you can start organizing it, you’ll need to know how to access the actual folders and shortcut files that make up the Start Menu (both versions of it). To find these folders and files, click the Start button and then right-click on the All Programs text (Windows XP users should right-click on the Start button itself): The Open option refers to the “just you” version of the Start Menu, while the Open All Users option refers to the “all users” version. Click on the one you want to organize. A Windows Explorer window then opens with your chosen version of the Start Menu selected. From there it’s easy. Double-click on the Programs folder and you’ll see all your folders and shortcuts. Now you can delete/rename/move until it’s just the way you want it. Note: When you’re reorganizing your Start Menu, you may want to have two Explorer windows open at the same time – one showing the “just you” version and one showing the “all users” version. You can drag-and-drop between the windows. Tip #20. Keep Your Start Menu Tidy Once you have a perfectly organized Start Menu, try to be a little vigilant about keeping it that way. Every time you install a new piece of software, the icons that get created will almost certainly violate your organizational structure. So to keep your Start Menu pristine and organized, make sure you do the following whenever you install a new piece of software: Check whether the software was installed into the “just you” area of the Start Menu, or the “all users” area, and then move it to the correct area. Remove all the unnecessary icons (like the “Read me” icon, the “Help” icon (you can always open the help from within the software itself when it’s running), the “Uninstall” icon, the link(s)to the manufacturer’s website, etc) Rename the main icon(s) of the software to something brief that makes sense to you. For example, you might like to rename Microsoft Office Word 2010 to simply Word Move the icon(s) into the correct folder based on your Start Menu organizational structure And don’t forget: when you uninstall a piece of software, the software’s uninstall routine is no longer going to be able to remove the software’s icon from the Start Menu (because you moved and/or renamed it), so you’ll need to remove that icon manually. Tip #21. Tidy C:\ The root of your C: drive (C:\) is a common dumping ground for files and folders – both by the users of your computer and by the software that you install on your computer. It can become a mess. There’s almost no software these days that requires itself to be installed in C:\. 99% of the time it can and should be installed into C:\Program Files. And as for your own files, well, it’s clear that they can (and almost always should) be stored somewhere else. In an ideal world, your C:\ folder should look like this (on Windows 7): Note that there are some system files and folders in C:\ that are usually and deliberately “hidden” (such as the Windows virtual memory file pagefile.sys, the boot loader file bootmgr, and the System Volume Information folder). Hiding these files and folders is a good idea, as they need to stay where they are and are almost never needed to be opened or even seen by you, the user. Hiding them prevents you from accidentally messing with them, and enhances your sense of order and well-being when you look at your C: drive folder. Tip #22. Tidy Your Desktop The Desktop is probably the most abused part of a Windows computer (from an organization point of view). It usually serves as a dumping ground for all incoming files, as well as holding icons to oft-used applications, plus some regularly opened files and folders. It often ends up becoming an uncontrolled mess. See if you can avoid this. Here’s why… Application icons (Word, Internet Explorer, etc) are often found on the Desktop, but it’s unlikely that this is the optimum place for them. The “Quick Launch” bar (or the Superbar in Windows 7) is always visible and so represents a perfect location to put your icons. You’ll only be able to see the icons on your Desktop when all your programs are minimized. It might be time to get your application icons off your desktop… You may have decided that the Inbox/To-do folder on your computer (see tip #13, above) should be your Desktop. If so, then enough said. Simply be vigilant about clearing it and preventing it from being polluted by junk files (see tip #15, above). On the other hand, if your Desktop is not acting as your “Inbox” folder, then there’s no reason for it to have any data files or folders on it at all, except perhaps a couple of shortcuts to often-opened files and folders (either ongoing or current projects). Everything else should be moved to your “Inbox” folder. In an ideal world, it might look like this: Tip #23. Move Permanent Items on Your Desktop Away from the Top-Left Corner When files/folders are dragged onto your desktop in a Windows Explorer window, or when shortcuts are created on your Desktop from Internet Explorer, those icons are always placed in the top-left corner – or as close as they can get. If you have other files, folders or shortcuts that you keep on the Desktop permanently, then it’s a good idea to separate these permanent icons from the transient ones, so that you can quickly identify which ones the transients are. An easy way to do this is to move all your permanent icons to the right-hand side of your Desktop. That should keep them separated from incoming items. Tip #24. Synchronize If you have more than one computer, you’ll almost certainly want to share files between them. If the computers are permanently attached to the same local network, then there’s no need to store multiple copies of any one file or folder – shortcuts will suffice. However, if the computers are not always on the same network, then you will at some point need to copy files between them. For files that need to permanently live on both computers, the ideal way to do this is to synchronize the files, as opposed to simply copying them. We only have room here to write a brief summary of synchronization, not a full article. In short, there are several different types of synchronization: Where the contents of one folder are accessible anywhere, such as with Dropbox Where the contents of any number of folders are accessible anywhere, such as with Windows Live Mesh Where any files or folders from anywhere on your computer are synchronized with exactly one other computer, such as with the Windows “Briefcase”, Microsoft SyncToy, or (much more powerful, yet still free) SyncBack from 2BrightSparks. This only works when both computers are on the same local network, at least temporarily. A great advantage of synchronization solutions is that once you’ve got it configured the way you want it, then the sync process happens automatically, every time. Click a button (or schedule it to happen automatically) and all your files are automagically put where they’re supposed to be. If you maintain the same file and folder structure on both computers, then you can also sync files depend upon the correct location of other files, like shortcuts, playlists and office documents that link to other office documents, and the synchronized files still work on the other computer! Tip #25. Hide Files You Never Need to See If you have your files well organized, you will often be able to tell if a file is out of place just by glancing at the contents of a folder (for example, it should be pretty obvious if you look in a folder that contains all the MP3s from one music CD and see a Word document in there). This is a good thing – it allows you to determine if there are files out of place with a quick glance. Yet sometimes there are files in a folder that seem out of place but actually need to be there, such as the “folder art” JPEGs in music folders, and various files in the root of the C: drive. If such files never need to be opened by you, then a good idea is to simply hide them. Then, the next time you glance at the folder, you won’t have to remember whether that file was supposed to be there or not, because you won’t see it at all! To hide a file, simply right-click on it and choose Properties: Then simply tick the Hidden tick-box: Tip #26. Keep Every Setup File These days most software is downloaded from the Internet. Whenever you download a piece of software, keep it. You’ll never know when you need to reinstall the software. Further, keep with it an Internet shortcut that links back to the website where you originally downloaded it, in case you ever need to check for updates. See tip #33 below for a full description of the excellence of organizing your setup files. Tip #27. Try to Minimize the Number of Folders that Contain Both Files and Sub-folders Some of the folders in your organizational structure will contain only files. Others will contain only sub-folders. And you will also have some folders that contain both files and sub-folders. You will notice slight improvements in how long it takes you to locate a file if you try to avoid this third type of folder. It’s not always possible, of course – you’ll always have some of these folders, but see if you can avoid it. One way of doing this is to take all the leftover files that didn’t end up getting stored in a sub-folder and create a special “Miscellaneous” or “Other” folder for them. Tip #28. Starting a Filename with an Underscore Brings it to the Top of a List Further to the previous tip, if you name that “Miscellaneous” or “Other” folder in such a way that its name begins with an underscore “_”, then it will appear at the top of the list of files/folders. The screenshot below is an example of this. Each folder in the list contains a set of digital photos. The folder at the top of the list, _Misc, contains random photos that didn’t deserve their own dedicated folder: Tip #29. Clean Up those CD-ROMs and (shudder!) Floppy Disks Have you got a pile of CD-ROMs stacked on a shelf of your office? Old photos, or files you archived off onto CD-ROM (or even worse, floppy disks!) because you didn’t have enough disk space at the time? In the meantime have you upgraded your computer and now have 500 Gigabytes of space you don’t know what to do with? If so, isn’t it time you tidied up that stack of disks and filed them into your gorgeous new folder structure? So what are you waiting for? Bite the bullet, copy them all back onto your computer, file them in their appropriate folders, and then back the whole lot up onto a shiny new 1000Gig external hard drive! Useful Folders to Create This next section suggests some useful folders that you might want to create within your folder structure. I’ve personally found them to be indispensable. The first three are all about convenience – handy folders to create and then put somewhere that you can always access instantly. For each one, it’s not so important where the actual folder is located, but it’s very important where you put the shortcut(s) to the folder. You might want to locate the shortcuts: On your Desktop In your “Quick Launch” area (or pinned to your Windows 7 Superbar) In your Windows Explorer “Favorite Links” area Tip #30. Create an “Inbox” (“To-Do”) Folder This has already been mentioned in depth (see tip #13), but we wanted to reiterate its importance here. This folder contains all the recently created, received or downloaded files that you have not yet had a chance to file away properly, and it also may contain files that you have yet to process. In effect, it becomes a sort of “to-do list”. It doesn’t have to be called “Inbox” – you can call it whatever you want. Tip #31. Create a Folder where Your Current Projects are Collected Rather than going hunting for them all the time, or dumping them all on your desktop, create a special folder where you put links (or work folders) for each of the projects you’re currently working on. You can locate this folder in your “Inbox” folder, on your desktop, or anywhere at all – just so long as there’s a way of getting to it quickly, such as putting a link to it in Windows Explorer’s “Favorite Links” area: Tip #32. Create a Folder for Files and Folders that You Regularly Open You will always have a few files that you open regularly, whether it be a spreadsheet of your current accounts, or a favorite playlist. These are not necessarily “current projects”, rather they’re simply files that you always find yourself opening. Typically such files would be located on your desktop (or even better, shortcuts to those files). Why not collect all such shortcuts together and put them in their own special folder? As with the “Current Projects” folder (above), you would want to locate that folder somewhere convenient. Below is an example of a folder called “Quick links”, with about seven files (shortcuts) in it, that is accessible through the Windows Quick Launch bar: See tip #37 below for a full explanation of the power of the Quick Launch bar. Tip #33. Create a “Set-ups” Folder A typical computer has dozens of applications installed on it. For each piece of software, there are often many different pieces of information you need to keep track of, including: The original installation setup file(s). This can be anything from a simple 100Kb setup.exe file you downloaded from a website, all the way up to a 4Gig ISO file that you copied from a DVD-ROM that you purchased. The home page of the software manufacturer (in case you need to look up something on their support pages, their forum or their online help) The page containing the download link for your actual file (in case you need to re-download it, or download an upgraded version) The serial number Your proof-of-purchase documentation Any other template files, plug-ins, themes, etc that also need to get installed For each piece of software, it’s a great idea to gather all of these files together and put them in a single folder. The folder can be the name of the software (plus possibly a very brief description of what it’s for – in case you can’t remember what the software does based in its name). Then you would gather all of these folders together into one place, and call it something like “Software” or “Setups”. If you have enough of these folders (I have several hundred, being a geek, collected over 20 years), then you may want to further categorize them. My own categorization structure is based on “platform” (operating system): The last seven folders each represents one platform/operating system, while _Operating Systems contains set-up files for installing the operating systems themselves. _Hardware contains ROMs for hardware I own, such as routers. Within the Windows folder (above), you can see the beginnings of the vast library of software I’ve compiled over the years: An example of a typical application folder looks like this: Tip #34. Have a “Settings” Folder We all know that our documents are important. So are our photos and music files. We save all of these files into folders, and then locate them afterwards and double-click on them to open them. But there are many files that are important to us that can’t be saved into folders, and then searched for and double-clicked later on. These files certainly contain important information that we need, but are often created internally by an application, and saved wherever that application feels is appropriate. A good example of this is the “PST” file that Outlook creates for us and uses to store all our emails, contacts, appointments and so forth. Another example would be the collection of Bookmarks that Firefox stores on your behalf. And yet another example would be the customized settings and configuration files of our all our software. Granted, most Windows programs store their configuration in the Registry, but there are still many programs that use configuration files to store their settings. Imagine if you lost all of the above files! And yet, when people are backing up their computers, they typically only back up the files they know about – those that are stored in the “My Documents” folder, etc. If they had a hard disk failure or their computer was lost or stolen, their backup files would not include some of the most vital files they owned. Also, when migrating to a new computer, it’s vital to ensure that these files make the journey. It can be a very useful idea to create yourself a folder to store all your “settings” – files that are important to you but which you never actually search for by name and double-click on to open them. Otherwise, next time you go to set up a new computer just the way you want it, you’ll need to spend hours recreating the configuration of your previous computer! So how to we get our important files into this folder? Well, we have a few options: Some programs (such as Outlook and its PST files) allow you to place these files wherever you want. If you delve into the program’s options, you will find a setting somewhere that controls the location of the important settings files (or “personal storage” – PST – when it comes to Outlook) Some programs do not allow you to change such locations in any easy way, but if you get into the Registry, you can sometimes find a registry key that refers to the location of the file(s). Simply move the file into your Settings folder and adjust the registry key to refer to the new location. Some programs stubbornly refuse to allow their settings files to be placed anywhere other then where they stipulate. When faced with programs like these, you have three choices: (1) You can ignore those files, (2) You can copy the files into your Settings folder (let’s face it – settings don’t change very often), or (3) you can use synchronization software, such as the Windows Briefcase, to make synchronized copies of all your files in your Settings folder. All you then have to do is to remember to run your sync software periodically (perhaps just before you run your backup software!). There are some other things you may decide to locate inside this new “Settings” folder: Exports of registry keys (from the many applications that store their configurations in the Registry). This is useful for backup purposes or for migrating to a new computer Notes you’ve made about all the specific customizations you have made to a particular piece of software (so that you’ll know how to do it all again on your next computer) Shortcuts to webpages that detail how to tweak certain aspects of your operating system or applications so they are just the way you like them (such as how to remove the words “Shortcut to” from the beginning of newly created shortcuts). In other words, you’d want to create shortcuts to half the pages on the How-To Geek website! Here’s an example of a “Settings” folder: Windows Features that Help with Organization This section details some of the features of Microsoft Windows that are a boon to anyone hoping to stay optimally organized. Tip #35. Use the “Favorite Links” Area to Access Oft-Used Folders Once you’ve created your great new filing system, work out which folders you access most regularly, or which serve as great starting points for locating the rest of the files in your folder structure, and then put links to those folders in your “Favorite Links” area of the left-hand side of the Windows Explorer window (simply called “Favorites” in Windows 7): Some ideas for folders you might want to add there include: Your “Inbox” folder (or whatever you’ve called it) – most important! The base of your filing structure (e.g. C:\Files) A folder containing shortcuts to often-accessed folders on other computers around the network (shown above as Network Folders) A folder containing shortcuts to your current projects (unless that folder is in your “Inbox” folder) Getting folders into this area is very simple – just locate the folder you’re interested in and drag it there! Tip #36. Customize the Places Bar in the File/Open and File/Save Boxes Consider the screenshot below: The highlighted icons (collectively known as the “Places Bar”) can be customized to refer to any folder location you want, allowing instant access to any part of your organizational structure. Note: These File/Open and File/Save boxes have been superseded by new versions that use the Windows Vista/Windows 7 “Favorite Links”, but the older versions (shown above) are still used by a surprisingly large number of applications. The easiest way to customize these icons is to use the Group Policy Editor, but not everyone has access to this program. If you do, open it up and navigate to: User Configuration > Administrative Templates > Windows Components > Windows Explorer > Common Open File Dialog If you don’t have access to the Group Policy Editor, then you’ll need to get into the Registry. Navigate to: HKEY_CURRENT_USER \ Software \ Microsoft \ Windows \ CurrentVersion \ Policies \ comdlg32 \ Placesbar It should then be easy to make the desired changes. Log off and log on again to allow the changes to take effect. Tip #37. Use the Quick Launch Bar as a Application and File Launcher That Quick Launch bar (to the right of the Start button) is a lot more useful than people give it credit for. Most people simply have half a dozen icons in it, and use it to start just those programs. But it can actually be used to instantly access just about anything in your filing system: For complete instructions on how to set this up, visit our dedicated article on this topic. Tip #38. Put a Shortcut to Windows Explorer into Your Quick Launch Bar This is only necessary in Windows Vista and Windows XP. The Microsoft boffins finally got wise and added it to the Windows 7 Superbar by default. Windows Explorer – the program used for managing your files and folders – is one of the most useful programs in Windows. Anyone who considers themselves serious about being organized needs instant access to this program at any time. A great place to create a shortcut to this program is in the Windows XP and Windows Vista “Quick Launch” bar: To get it there, locate it in your Start Menu (usually under “Accessories”) and then right-drag it down into your Quick Launch bar (and create a copy). Tip #39. Customize the Starting Folder for Your Windows 7 Explorer Superbar Icon If you’re on Windows 7, your Superbar will include a Windows Explorer icon. Clicking on the icon will launch Windows Explorer (of course), and will start you off in your “Libraries” folder. Libraries may be fine as a starting point, but if you have created yourself an “Inbox” folder, then it would probably make more sense to start off in this folder every time you launch Windows Explorer. To change this default/starting folder location, then first right-click the Explorer icon in the Superbar, and then right-click Properties:Then, in Target field of the Windows Explorer Properties box that appears, type %windir%\explorer.exe followed by the path of the folder you wish to start in. For example: %windir%\explorer.exe C:\Files If that folder happened to be on the Desktop (and called, say, “Inbox”), then you would use the following cleverness: %windir%\explorer.exe shell:desktop\Inbox Then click OK and test it out. Tip #40. Ummmmm…. No, that’s it. I can’t think of another one. That’s all of the tips I can come up with. I only created this one because 40 is such a nice round number… Case Study – An Organized PC To finish off the article, I have included a few screenshots of my (main) computer (running Vista). The aim here is twofold: To give you a sense of what it looks like when the above, sometimes abstract, tips are applied to a real-life computer, and To offer some ideas about folders and structure that you may want to steal to use on your own PC. Let’s start with the C: drive itself. Very minimal. All my files are contained within C:\Files. I’ll confine the rest of the case study to this folder: That folder contains the following: Mark: My personal files VC: My business (Virtual Creations, Australia) Others contains files created by friends and family Data contains files from the rest of the world (can be thought of as “public” files, usually downloaded from the Net) Settings is described above in tip #34 The Data folder contains the following sub-folders: Audio: Radio plays, audio books, podcasts, etc Development: Programmer and developer resources, sample source code, etc (see below) Humour: Jokes, funnies (those emails that we all receive) Movies: Downloaded and ripped movies (all legal, of course!), their scripts, DVD covers, etc. Music: (see below) Setups: Installation files for software (explained in full in tip #33) System: (see below) TV: Downloaded TV shows Writings: Books, instruction manuals, etc (see below) The Music folder contains the following sub-folders: Album covers: JPEG scans Guitar tabs: Text files of guitar sheet music Lists: e.g. “Top 1000 songs of all time” Lyrics: Text files MIDI: Electronic music files MP3 (representing 99% of the Music folder): MP3s, either ripped from CDs or downloaded, sorted by artist/album name Music Video: Video clips Sheet Music: usually PDFs The Data\Writings folder contains the following sub-folders: (all pretty self-explanatory) The Data\Development folder contains the following sub-folders: Again, all pretty self-explanatory (if you’re a geek) The Data\System folder contains the following sub-folders: These are usually themes, plug-ins and other downloadable program-specific resources. The Mark folder contains the following sub-folders: From Others: Usually letters that other people (friends, family, etc) have written to me For Others: Letters and other things I have created for other people Green Book: None of your business Playlists: M3U files that I have compiled of my favorite songs (plus one M3U playlist file for every album I own) Writing: Fiction, philosophy and other musings of mine Mark Docs: Shortcut to C:\Users\Mark Settings: Shortcut to C:\Files\Settings\Mark The Others folder contains the following sub-folders: The VC (Virtual Creations, my business – I develop websites) folder contains the following sub-folders: And again, all of those are pretty self-explanatory. Conclusion These tips have saved my sanity and helped keep me a productive geek, but what about you? What tips and tricks do you have to keep your files organized? Please share them with us in the comments. Come on, don’t be shy… Similar Articles Productive Geek Tips Fix For When Windows Explorer in Vista Stops Showing File NamesWhy Did Windows Vista’s Music Folder Icon Turn Yellow?Print or Create a Text File List of the Contents in a Directory the Easy WayCustomize the Windows 7 or Vista Send To MenuAdd Copy To / Move To on Windows 7 or Vista Right-Click Menu TouchFreeze Alternative in AutoHotkey The Icy Undertow Desktop Windows Home Server – Backup to LAN The Clear & Clean Desktop Use This Bookmarklet to Easily Get Albums Use AutoHotkey to Assign a Hotkey to a Specific Window Latest Software Reviews Tinyhacker Random Tips Acronis Online Backup DVDFab 6 Revo Uninstaller Pro Registry Mechanic 9 for Windows Track Daily Goals With 42Goals Video Toolbox is a Superb Online Video Editor Fun with 47 charts and graphs Tomorrow is Mother’s Day Check the Average Speed of YouTube Videos You’ve Watched OutlookStatView Scans and Displays General Usage Statistics

Search Results

Search found 1226 results on 50 pages for 'improvement'.

Page 49/50 | < Previous Page | 45 46 47 48 49 50 | Next Page >

- by bobloki

- by bobloki

- by Bernt Habermeier

- by tjb

- by Dave Jarvis

- by Eric Charles

- by Doug

- by mark

- by Rick Strahl

- by Paul White

- by danx

- by Rick Strahl

- by Michael B. McLaughlin

- by Rick Strahl

- by Aristos

- by algro

- by Oleg Abrazhaev

- by mseika

- by Shaun

- by Rick Strahl

- by Rick Strahl

- by Rick Strahl

- by eagle

- by Mark Virtue

< Previous Page | 45 46 47 48 49 50 | Next Page >