Search Results

Search found 3874 results on 155 pages for 'differed execution'.

Page 75/155 | < Previous Page | 71 72 73 74 75 76 77 78 79 80 81 82  | Next Page >

  • Heaps of Trouble?

    - by Paul White NZ
    If you’re not already a regular reader of Brad Schulz’s blog, you’re missing out on some great material.  In his latest entry, he is tasked with optimizing a query run against tables that have no indexes at all.  The problem is, predictably, that performance is not very good.  The catch is that we are not allowed to create any indexes (or even new statistics) as part of our optimization efforts. In this post, I’m going to look at the problem from a slightly different angle, and present an alternative solution to the one Brad found.  Inevitably, there’s going to be some overlap between our entries, and while you don’t necessarily need to read Brad’s post before this one, I do strongly recommend that you read it at some stage; he covers some important points that I won’t cover again here. The Example We’ll use data from the AdventureWorks database, copied to temporary unindexed tables.  A script to create these structures is shown below: CREATE TABLE #Custs ( CustomerID INTEGER NOT NULL, TerritoryID INTEGER NULL, CustomerType NCHAR(1) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #Prods ( ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, Name NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #OrdHeader ( SalesOrderID INTEGER NOT NULL, OrderDate DATETIME NOT NULL, SalesOrderNumber NVARCHAR(25) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, CustomerID INTEGER NOT NULL, ); GO CREATE TABLE #OrdDetail ( SalesOrderID INTEGER NOT NULL, OrderQty SMALLINT NOT NULL, LineTotal NUMERIC(38,6) NOT NULL, ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, ); GO INSERT #Custs ( CustomerID, TerritoryID, CustomerType ) SELECT C.CustomerID, C.TerritoryID, C.CustomerType FROM AdventureWorks.Sales.Customer C WITH (TABLOCK); GO INSERT #Prods ( ProductMainID, ProductSubID, ProductSubSubID, Name ) SELECT P.ProductID, P.ProductID, P.ProductID, P.Name FROM AdventureWorks.Production.Product P WITH (TABLOCK); GO INSERT #OrdHeader ( SalesOrderID, OrderDate, SalesOrderNumber, CustomerID ) SELECT H.SalesOrderID, H.OrderDate, H.SalesOrderNumber, H.CustomerID FROM AdventureWorks.Sales.SalesOrderHeader H WITH (TABLOCK); GO INSERT #OrdDetail ( SalesOrderID, OrderQty, LineTotal, ProductMainID, ProductSubID, ProductSubSubID ) SELECT D.SalesOrderID, D.OrderQty, D.LineTotal, D.ProductID, D.ProductID, D.ProductID FROM AdventureWorks.Sales.SalesOrderDetail D WITH (TABLOCK); The query itself is a simple join of the four tables: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #OrdDetail D ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID JOIN #OrdHeader H ON D.SalesOrderID = H.SalesOrderID JOIN #Custs C ON H.CustomerID = C.CustomerID ORDER BY P.ProductMainID ASC OPTION (RECOMPILE, MAXDOP 1); Remember that these tables have no indexes at all, and only the single-column sampled statistics SQL Server automatically creates (assuming default settings).  The estimated query plan produced for the test query looks like this (click to enlarge): The Problem The problem here is one of cardinality estimation – the number of rows SQL Server expects to find at each step of the plan.  The lack of indexes and useful statistical information means that SQL Server does not have the information it needs to make a good estimate.  Every join in the plan shown above estimates that it will produce just a single row as output.  Brad covers the factors that lead to the low estimates in his post. In reality, the join between the #Prods and #OrdDetail tables will produce 121,317 rows.  It should not surprise you that this has rather dire consequences for the remainder of the query plan.  In particular, it makes a nonsense of the optimizer’s decision to use Nested Loops to join to the two remaining tables.  Instead of scanning the #OrdHeader and #Custs tables once (as it expected), it has to perform 121,317 full scans of each.  The query takes somewhere in the region of twenty minutes to run to completion on my development machine. A Solution At this point, you may be thinking the same thing I was: if we really are stuck with no indexes, the best we can do is to use hash joins everywhere. We can force the exclusive use of hash joins in several ways, the two most common being join and query hints.  A join hint means writing the query using the INNER HASH JOIN syntax; using a query hint involves adding OPTION (HASH JOIN) at the bottom of the query.  The difference is that using join hints also forces the order of the join, whereas the query hint gives the optimizer freedom to reorder the joins at its discretion. Adding the OPTION (HASH JOIN) hint results in this estimated plan: That produces the correct output in around seven seconds, which is quite an improvement!  As a purely practical matter, and given the rigid rules of the environment we find ourselves in, we might leave things there.  (We can improve the hashing solution a bit – I’ll come back to that later on). Faster Nested Loops It might surprise you to hear that we can beat the performance of the hash join solution shown above using nested loops joins exclusively, and without breaking the rules we have been set. The key to this part is to realize that a condition like (A = B) can be expressed as (A <= B) AND (A >= B).  Armed with this tremendous new insight, we can rewrite the join predicates like so: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #OrdDetail D JOIN #OrdHeader H ON D.SalesOrderID >= H.SalesOrderID AND D.SalesOrderID <= H.SalesOrderID JOIN #Custs C ON H.CustomerID >= C.CustomerID AND H.CustomerID <= C.CustomerID JOIN #Prods P ON P.ProductMainID >= D.ProductMainID AND P.ProductMainID <= D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (RECOMPILE, LOOP JOIN, MAXDOP 1, FORCE ORDER); I’ve also added LOOP JOIN and FORCE ORDER query hints to ensure that only nested loops joins are used, and that the tables are joined in the order they appear.  The new estimated execution plan is: This new query runs in under 2 seconds. Why Is It Faster? The main reason for the improvement is the appearance of the eager Index Spools, which are also known as index-on-the-fly spools.  If you read my Inside The Optimiser series you might be interested to know that the rule responsible is called JoinToIndexOnTheFly. An eager index spool consumes all rows from the table it sits above, and builds a index suitable for the join to seek on.  Taking the index spool above the #Custs table as an example, it reads all the CustomerID and TerritoryID values with a single scan of the table, and builds an index keyed on CustomerID.  The term ‘eager’ means that the spool consumes all of its input rows when it starts up.  The index is built in a work table in tempdb, has no associated statistics, and only exists until the query finishes executing. The result is that each unindexed table is only scanned once, and just for the columns necessary to build the temporary index.  From that point on, every execution of the inner side of the join is answered by a seek on the temporary index – not the base table. A second optimization is that the sort on ProductMainID (required by the ORDER BY clause) is performed early, on just the rows coming from the #OrdDetail table.  The optimizer has a good estimate for the number of rows it needs to sort at that stage – it is just the cardinality of the table itself.  The accuracy of the estimate there is important because it helps determine the memory grant given to the sort operation.  Nested loops join preserves the order of rows on its outer input, so sorting early is safe.  (Hash joins do not preserve order in this way, of course). The extra lazy spool on the #Prods branch is a further optimization that avoids executing the seek on the temporary index if the value being joined (the ‘outer reference’) hasn’t changed from the last row received on the outer input.  It takes advantage of the fact that rows are still sorted on ProductMainID, so if duplicates exist, they will arrive at the join operator one after the other. The optimizer is quite conservative about introducing index spools into a plan, because creating and dropping a temporary index is a relatively expensive operation.  It’s presence in a plan is often an indication that a useful index is missing. I want to stress that I rewrote the query in this way primarily as an educational exercise – I can’t imagine having to do something so horrible to a production system. Improving the Hash Join I promised I would return to the solution that uses hash joins.  You might be puzzled that SQL Server can create three new indexes (and perform all those nested loops iterations) faster than it can perform three hash joins.  The answer, again, is down to the poor information available to the optimizer.  Let’s look at the hash join plan again: Two of the hash joins have single-row estimates on their build inputs.  SQL Server fixes the amount of memory available for the hash table based on this cardinality estimate, so at run time the hash join very quickly runs out of memory. This results in the join spilling hash buckets to disk, and any rows from the probe input that hash to the spilled buckets also get written to disk.  The join process then continues, and may again run out of memory.  This is a recursive process, which may eventually result in SQL Server resorting to a bailout join algorithm, which is guaranteed to complete eventually, but may be very slow.  The data sizes in the example tables are not large enough to force a hash bailout, but it does result in multiple levels of hash recursion.  You can see this for yourself by tracing the Hash Warning event using the Profiler tool. The final sort in the plan also suffers from a similar problem: it receives very little memory and has to perform multiple sort passes, saving intermediate runs to disk (the Sort Warnings Profiler event can be used to confirm this).  Notice also that because hash joins don’t preserve sort order, the sort cannot be pushed down the plan toward the #OrdDetail table, as in the nested loops plan. Ok, so now we understand the problems, what can we do to fix it?  We can address the hash spilling by forcing a different order for the joins: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #Custs C JOIN #OrdHeader H ON H.CustomerID = C.CustomerID JOIN #OrdDetail D ON D.SalesOrderID = H.SalesOrderID ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (MAXDOP 1, HASH JOIN, FORCE ORDER); With this plan, each of the inputs to the hash joins has a good estimate, and no hash recursion occurs.  The final sort still suffers from the one-row estimate problem, and we get a single-pass sort warning as it writes rows to disk.  Even so, the query runs to completion in three or four seconds.  That’s around half the time of the previous hashing solution, but still not as fast as the nested loops trickery. Final Thoughts SQL Server’s optimizer makes cost-based decisions, so it is vital to provide it with accurate information.  We can’t really blame the performance problems highlighted here on anything other than the decision to use completely unindexed tables, and not to allow the creation of additional statistics. I should probably stress that the nested loops solution shown above is not one I would normally contemplate in the real world.  It’s there primarily for its educational and entertainment value.  I might perhaps use it to demonstrate to the sceptical that SQL Server itself is crying out for an index. Be sure to read Brad’s original post for more details.  My grateful thanks to him for granting permission to reuse some of his material. Paul White Email: [email protected] Twitter: @PaulWhiteNZ

    Read the article

  • SharePoint and COMException (0x80004005): Cannot complete this action

    - by Damon
    I ran into a small issue today working on a deployment.  We were moving a custom ASP.NET control from my development environment into a SharePoint layout page on a staging environment .  I was expecting some minor issues to arise since I had developed the control in an ASP.NET website project, but after getting everything moved over we got an obscure COMException error the that looked like this: Cannot complete this action. Please try again. Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code. Exception Details: System.Runtime.InteropServices.COMException: Cannot complete this action. [COMException (0x80004005): Cannot complete this action. .Lengthy stack trace goes here. Everything in the custom control was built using managed code, so we weren't sure why a COMException would suddenly appear. The control made use of an ITemplate to define its UI, so there was a lot of markup and binding code inside the template. As such, we started taking chunks of the template out of the layout page and eventually the error went away.  It was being caused by a section of code where we were calling a custom utility method inside some binding code: <%# WebUtility.FormatDecimal(.) %> Solution: It turns out that we were missing an Assembly and Import directive at the top of the page to let the page know where to find this method.  After adding these to the page, the error went away and everything worked great.  So a COMException (0x80004005) Cannot complete this action error is just SharePoint's friendly way of letting you know you're missing an assembly or imports reference.

    Read the article

  • Handling EJB/JPA exceptions in a “beautiful” way

    - by Rodrigues, Raphael
    In order to handle JPA exceptions, there are some ways already detailed in lots of blogs. Here, I intend to show one of them, which I consider kind of “beauty”. My use case has a unique constraint, when the User try to create a duplicate value in database. The JPA throws a java.sql.SQLIntegrityConstraintViolationException, and I have to catch it and replace the message. In fact, ADF Business Components framework already has a beautiful solution for this very well documented here . However, for EJB/JPA there's no similar approach. In my case, what I had to do was: 1. Create a custom Error Handler Class in DataBindings file; a. Here is how you accomplish it. 2. Override the reportException method and check if the type of exception exists on property file a. If yes, I change the message and rethrows the exception b. If not, go on the execution The main goal of this approach is whether a new or unhandled Exception was raised, the job is, only create a single entry in bundle property file. Here are pictures step by step: 1. CustomExceptionHandler.java 2. Databindings.cpx 3. Bundle file 4. jspx: 5. Stacktrace: Give your opinion, what did you think about that?

    Read the article

  • Join our webcast: Discover What’s New in Oracle Data Integrator and Oracle GoldenGate

    - by Irem Radzik
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif"; mso-fareast-font-family:"Times New Roman";} Data integration team has organized a series of webcasts for this summer. We are kicking it off this Thursday June 30th at 10am PT with a product update webcast: Discover What’s New in Oracle Data Integrator and Oracle GoldenGate. In this webcast you will hear from product management about the new patch updates to both GoldenGate 11g R1 and ODI 11gR1. Jeff Pollock, Sr. Director of Product Management for ODI will talk about the new features in Oracle Data Integrator 11.1.1.5, including the data lineage integration with OBI EE, enhanced web services to support flexible architectures as well as capabilities for efficient object execution such as Load Plans. Jeff will discuss support for complex files and performance enhancements. Chris McAllister, Sr. Director of Product Management for Oracle GoldenGate will cover the new features of Oracle GoldenGate 11.1.1.1 such as increased data security by supporting Oracle Database Advanced Security option, deeper integration with Oracle Database, and the expanded list of heterogeneous databases GoldenGate supports . Chris will also talk about the new Oracle GoldenGate 11gR1 release for HP NonStop platform and will provide information on our strategic direction for product development. Join us this Thursday at 10am PT/ 1pm ET to hear directly from Data Integration Product Management . You can register here for the June 30th webcast as well as for the upcoming ones in our summer webcast series.

    Read the article

  • How to Create a Cinemagraph; Still Photos with Moving Elements

    - by Jason Fitzpatrick
    Cinemagraphs, still photos with moving elements, are quite a trend in photography circles. Unlike jerky animated GIFs, they’re much more fluid and subtle. Learn how to make them with this tutorial. At Photojojo! they have a detailed tutorial outlining how to create a cinemagraph that covers the planning and execution of your image. If you’ve never heard of the process before it’s pretty neat. You take a series of photographs and then, using image editing tools, mask off only the parts of the photo you want to move (like the eyes in the photos sample seen here). Then you blend the photos together and create an animated GIF where in only the small portion of the image actually moves. This is quite different than the jerky animated GIFs that result when people try to turn full motion video into an animation. Hit up the link below to see how you can create your own cinemagraphs. How to Make Cinemagraphs — Still Photos that Move Like Movies! How To Recover After Your Email Password Is CompromisedHow to Clean Your Filthy Keyboard in the Dishwasher (Without Ruining it)Learn How to Make HDR Images in Photoshop or GIMP With a Simple Trick

    Read the article

  • defunct dbus-daemon zombie freezes login for 30 seconds

    - by oldenburgb
    I'm running Ubuntu 12.04.2 LTS (Precise Pangolin) and noticed a 30 second delay whenever i log into my server via ssh or perform any kind of login via sudo on that machine. I can provoke immediate execution by killing the defunct dbus-daemon showing up during the delay: output of ps fax |grep dbus 19222 ? Ss 0:00 dbus-daemon --system --fork --activation=upstart 19752 ? Z 0:00 \_ [dbus-daemon] <defunct> taping into the dbus using dbus-monitor --system i'm getting: signal sender=org.freedesktop.DBus -> dest=(null destination) serial=7 path=/org/freedesktop/DBus; interface=org.freedesktop.DBus; member=NameOwnerChanged string ":1.4" string "" string ":1.4" each login. Stopping the dbus service eliminates this problem but probably causes many other... I'm not running xorg on the machine but the packages are present for X11 forwarding capabilities. I've ruled out the common motd script delay and ssh "UseDNS no" fixes one finds when looking up login delay issues. Many thanks in advance for any help with this, it's been driving me crazy ;-)

    Read the article

  • Node remains in commissioning status

    - by Vinitha
    I have been trying to set up ubuntu cloud 12.04. I'm kind of new to MAAS and ubuntu. Here is what I followed. Have installed MAAS server using the steps provided in https://wiki.ubuntu.com/ServerTeam/MAAS For the node, I installed the Ubuntu 12.04 Server Image on a USB Stick. Then restarted the node and opted to enlist the node via boot media, with PXE. once the process was done, the node was powered off as expected. I manually powered on the node, as my node is not PXE enabled. Result - No node was visible on MAAS UI Since step 2 didn't work, I added the node via maas-cli. command. After the execution of this command I got the node reflected on to my MAAS UI. But the status continues to be in "Commissioning" for a long time. Then I executed "maas-cli maas nodes check-commissioning " and i got "Unrecognised signature: POST check_commissioning". I'm not sure where is the error. Could some one please help me solve this issue. I checked the following log file but found no error related to commissioning (pserv.log / maas.log / celery.log/celery-region.log). I found this entry in my auth.log "Nov 16 18:20:34 ubuntuCloud sshd[4222]: Did not receive identification string from xxx.xx.xx.x" not sure if it indicates anything as the ip that is mentioned is not of the node nor of the MAAS server. I also verified the time on the server and node using date cmd - (at one instance the times are : Server: Fri Nov 16 18:15:51 IST 2012 and Node Fri Nov 16 18:15:43 IST 2012). Not sure if 'date' the right cmd to set the time. I have also check maas_local_settings.py for the MAAS url. I'm not sure what are the logs that need to be verified. Is there any log that can be checked on the Node. Thanks Vinitha

    Read the article

  • What does "error reading login count from pmvarrun" mean?

    - by n3rd
    I have the above mentioned error in my /var/log/auth.log file and just try to figure out if this is a harmelss statement. As far as I understand does pmvarrun tells the system how many active session (e.g. logins) a user has on the system. Full output of auth.log Jan 24 17:44:42 P835 lightdm: pam_unix(lightdm:session): session opened for user lightdm by (uid=0) Jan 24 17:44:42 P835 lightdm: pam_ck_connector(lightdm:session): nox11 mode, ignoring PAM_TTY :0 Jan 24 17:44:49 P835 lightdm: pam_succeed_if(lightdm:auth): requirement "user ingroup nopasswdlogin" not met by user "user" Jan 24 17:44:51 P835 dbus[1289]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.31" (uid=104 pid=1882 comm="/usr/lib/indicator-datetime/indicator-datetime-ser") interface="org.freedesktop.DBus.Properties" member="GetAll" error name="(unset)" requested_reply="0" destination=":1.17" (uid=0 pid=1561 comm="/usr/sbin/console-kit-daemon --no-daemon ") Jan 24 17:45:04 P835 lightdm: pam_unix(lightdm:session): session closed for user lightdm Jan 24 17:45:04 P835 lightdm: pam_mount(pam_mount.c:691): received order to close things Jan 24 17:45:04 P835 lightdm: pam_mount(pam_mount.c:693): No volumes to umount Jan 24 17:45:04 P835 lightdm: command: 'pmvarrun' '-u' 'user' '-o' '-1' Jan 24 17:45:04 P835 lightdm: pam_mount(misc.c:38): set_myuid<pre>: (ruid/rgid=0/0, e=0/0) Jan 24 17:45:04 P835 lightdm: pam_mount(misc.c:38): set_myuid<post>: (ruid/rgid=0/0, e=0/0) Jan 24 17:45:04 P835 lightdm: pam_mount(pam_mount.c:438): error reading login count from pmvarrun Jan 24 17:45:04 P835 lightdm: pam_mount(pam_mount.c:728): pam_mount execution complete Jan 24 17:45:08 P835 lightdm: pam_unix(lightdm:session): session opened for user user by (uid=0) Jan 24 17:45:08 P835 lightdm: pam_ck_connector(lightdm:session): nox11 mode, ignoring PAM_TTY :0 Jan 24 17:45:25 P835 polkitd(authority=local): Registered Authentication Agent for unix-session:/org/freedesktop/ConsoleKit/Session2 (system bus name :1.54 [/usr/lib/policykit-1-gnome/polkit-gnome-authentication-agent-1], object path /org/gnome/PolicyKit1/AuthenticationAgent, locale en_US.UTF-8) Jan 24 17:45:47 P835 dbus[1289]: [system] Rejected send message, 2 matched rules; type="method_call", sender=":1.59" (uid=1000 pid=4748 comm="/usr/lib/indicator-datetime/indicator-datetime-ser") interface="org.freedesktop.DBus.Properties" member="GetAll" error name="(unset)" requested_reply="0" destination=":1.17" (uid=0 pid=1561 comm="/usr/sbin/console-kit-daemon --no-daemon ") Thanks for any help

    Read the article

  • SYS2 Scripts Updated – Scripts to monitor database backup, database space usage and memory grants now available

    - by Davide Mauri
    I’ve just released three new scripts of my “sys2” script collection that can be found on CodePlex: Project Page: http://sys2dmvs.codeplex.com/ Source Code Download: http://sys2dmvs.codeplex.com/SourceControl/changeset/view/57732 The three new scripts are the following sys2.database_backup_info.sql sys2.query_memory_grants.sql sys2.stp_get_databases_space_used_info.sql Here’s some more details: database_backup_info This script has been made to quickly check if and when backup was done. It will report the last full, differential and log backup date and time for each database. Along with these information you’ll also get some additional metadata that shows if a database is a read-only database and its recovery model: By default it will check only the last seven days, but you can change this value just specifying how many days back you want to check. To analyze the last seven days, and list only the database with FULL recovery model without a log backup select * from sys2.databases_backup_info(default) where recovery_model = 3 and log_backup = 0 To analyze the last fifteen days, and list only the database with FULL recovery model with a differential backup select * from sys2.databases_backup_info(15) where recovery_model = 3 and diff_backup = 1 I just love this script, I use it every time I need to check that backups are not too old and that t-log backup are correctly scheduled. query_memory_grants This is just a wrapper around sys.dm_exec_query_memory_grants that enriches the default result set with the text of the query for which memory has been granted or is waiting for a memory grant and, optionally, its execution plan stp_get_databases_space_used_info This is a stored procedure that list all the available databases and for each one the overall size, the used space within that size, the maximum size it may reach and the auto grow options. This is another script I use every day in order to be able to monitor, track and forecast database space usage. As usual feedbacks and suggestions are more than welcome!

    Read the article

  • T-SQL Tuesday #33: Trick Shots: Undocumented, Underdocumented, and Unknown Conspiracies!

    - by Most Valuable Yak (Rob Volk)
    Mike Fal (b | t) is hosting this month's T-SQL Tuesday on Trick Shots.  I love this choice because I've been preoccupied with sneaky/tricky/evil SQL Server stuff for a long time and have been presenting on it for the past year.  Mike's directives were "Show us a cool trick or process you developed…It doesn’t have to be useful", which most of my blogging definitely fits, and "Tell us what you learned from this trick…tell us how it gave you insight in to how SQL Server works", which is definitely a new concept.  I've done a lot of reading and watching on SQL Server Internals and even attended training, but sometimes I need to go explore on my own, using my own tools and techniques.  It's an itch I get every few months, and, well, it sure beats workin'. I've found some people to be intimidated by SQL Server's internals, and I'll admit there are A LOT of internals to keep track of, but there are tons of excellent resources that clearly document most of them, and show how knowing even the basics of internals can dramatically improve your database's performance.  It may seem like rocket science, or even brain surgery, but you don't have to be a genius to understand it. Although being an "evil genius" can help you learn some things they haven't told you about. ;) This blog post isn't a traditional "deep dive" into internals, it's more of an approach to find out how a program works.  It utilizes an extremely handy tool from an even more extremely handy suite of tools, Sysinternals.  I'm not the only one who finds Sysinternals useful for SQL Server: Argenis Fernandez (b | t), Microsoft employee and former T-SQL Tuesday host, has an excellent presentation on how to troubleshoot SQL Server using Sysinternals, and I highly recommend it.  Argenis didn't cover the Strings.exe utility, but I'll be using it to "hack" the SQL Server executable (DLL and EXE) files. Please note that I'm not promoting software piracy or applying these techniques to attack SQL Server via internal knowledge. This is strictly educational and doesn't reveal any proprietary Microsoft information.  And since Argenis works for Microsoft and demonstrated Sysinternals with SQL Server, I'll just let him take the blame for it. :P (The truth is I've used Strings.exe on SQL Server before I ever met Argenis.) Once you download and install Strings.exe you can run it from the command line.  For our purposes we'll want to run this in the Binn folder of your SQL Server instance (I'm referencing SQL Server 2012 RTM): cd "C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn" C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn> strings *sql*.dll > sqldll.txt C:\Program Files\Microsoft SQL Server\MSSQL11\MSSQL\Binn> strings *sql*.exe > sqlexe.txt   I've limited myself to DLLs and EXEs that have "sql" in their names.  There are quite a few more but I haven't examined them in any detail. (Homework assignment for you!) If you run this yourself you'll get 2 text files, one with all the extracted strings from every SQL DLL file, and the other with the SQL EXE strings.  You can open these in Notepad, but you're better off using Notepad++, EditPad, Emacs, Vim or another more powerful text editor, as these will be several megabytes in size. And when you do open it…you'll find…a TON of gibberish.  (If you think that's bad, just try opening the raw DLL or EXE file in Notepad.  And by the way, don't do this in production, or even on a running instance of SQL Server.)  Even if you don't clean up the file, you can still use your editor's search function to find a keyword like "SELECT" or some other item you expect to be there.  As dumb as this sounds, I sometimes spend my lunch break just scanning the raw text for anything interesting.  I'm boring like that. Sometimes though, having these files available can lead to some incredible learning experiences.  For me the most recent time was after reading Joe Sack's post on non-parallel plan reasons.  He mentions a new SQL Server 2012 execution plan element called NonParallelPlanReason, and demonstrates a query that generates "MaxDOPSetToOne".  Joe (formerly on the Microsoft SQL Server product team, so he knows this stuff) mentioned that this new element was not currently documented and tried a few more examples to see what other reasons could be generated. Since I'd already run Strings.exe on the SQL Server DLLs and EXE files, it was easy to run grep/find/findstr for MaxDOPSetToOne on those extracts.  Once I found which files it belonged to (sqlmin.dll) I opened the text to see if the other reasons were listed.  As you can see in my comment on Joe's blog, there were about 20 additional non-parallel reasons.  And while it's not "documentation" of this underdocumented feature, the names are pretty self-explanatory about what can prevent parallel processing. I especially like the ones about cursors – more ammo! - and am curious about the PDW compilation and Cloud DB replication reasons. One reason completely stumped me: NoParallelHekatonPlan.  What the heck is a hekaton?  Google and Wikipedia were vague, and the top results were not in English.  I found one reference to Greek, stating "hekaton" can be translated as "hundredfold"; with a little more Wikipedia-ing this leads to hecto, the prefix for "one hundred" as a unit of measure.  I'm not sure why Microsoft chose hekaton for such a plan name, but having already learned some Greek I figured I might as well dig some more in the DLL text for hekaton.  Here's what I found: hekaton_slow_param_passing Occurs when a Hekaton procedure call dispatch goes to slow parameter passing code path The reason why Hekaton parameter passing code took the slow code path hekaton_slow_param_pass_reason sp_deploy_hekaton_database sp_undeploy_hekaton_database sp_drop_hekaton_database sp_checkpoint_hekaton_database sp_restore_hekaton_database e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\hkproc.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\matgen.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\matquery.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\sqlmeta.cpp e:\sql11_main_t\sql\ntdbms\hekaton\sqlhost\sqllang\resultset.cpp Interesting!  The first 4 entries (in red) mention parameters and "slow code".  Could this be the foundation of the mythical DBCC RUNFASTER command?  Have I been passing my parameters the slow way all this time? And what about those sp_xxxx_hekaton_database procedures (in blue)? Could THEY be the secret to a faster SQL Server? Could they promise a "hundredfold" improvement in performance?  Are these special, super-undocumented DIB (databases in black)? I decided to look in the SQL Server system views for any objects with hekaton in the name, or references to them, in hopes of discovering some new code that would answer all my questions: SELECT name FROM sys.all_objects WHERE name LIKE '%hekaton%' SELECT name FROM sys.all_objects WHERE object_definition(OBJECT_ID) LIKE '%hekaton%' Which revealed: name ------------------------ (0 row(s) affected) name ------------------------ sp_createstats sp_recompile sp_updatestats (3 row(s) affected)   Hmm.  Well that didn't find much.  Looks like these procedures are seriously undocumented, unknown, perhaps forbidden knowledge. Maybe a part of some unspeakable evil? (No, I'm not paranoid, I just like mysteries and thought that punching this up with that kind of thing might keep you reading.  I know I'd fall asleep without it.) OK, so let's check out those 3 procedures and see what they reveal when I search for "Hekaton": sp_createstats: -- filter out local temp tables, Hekaton tables, and tables for which current user has no permissions -- Note that OBJECTPROPERTY returns NULL on type="IT" tables, thus we only call it on type='U' tables   OK, that's interesting, let's go looking down a little further: ((@table_type<>'U') or (0 = OBJECTPROPERTY(@table_id, 'TableIsInMemory'))) and -- Hekaton table   Wellllll, that tells us a few new things: There's such a thing as Hekaton tables (UPDATE: I'm not the only one to have found them!) They are not standard user tables and probably not in memory UPDATE: I misinterpreted this because I didn't read all the code when I wrote this blog post. The OBJECTPROPERTY function has an undocumented TableIsInMemory option Let's check out sp_recompile: -- (3) Must not be a Hekaton procedure.   And once again go a little further: if (ObjectProperty(@objid, 'IsExecuted') <> 0 AND ObjectProperty(@objid, 'IsInlineFunction') = 0 AND ObjectProperty(@objid, 'IsView') = 0 AND -- Hekaton procedure cannot be recompiled -- Make them go through schema version bumping branch, which will fail ObjectProperty(@objid, 'ExecIsCompiledProc') = 0)   And now we learn that hekaton procedures also exist, they can't be recompiled, there's a "schema version bumping branch" somewhere, and OBJECTPROPERTY has another undocumented option, ExecIsCompiledProc.  (If you experiment with this you'll find this option returns null, I think it only works when called from a system object.) This is neat! Sadly sp_updatestats doesn't reveal anything new, the comments about hekaton are the same as sp_createstats.  But we've ALSO discovered undocumented features for the OBJECTPROPERTY function, which we can now search for: SELECT name, object_definition(OBJECT_ID) FROM sys.all_objects WHERE object_definition(OBJECT_ID) LIKE '%OBJECTPROPERTY(%'   I'll leave that to you as more homework.  I should add that searching the system procedures was recommended long ago by the late, great Ken Henderson, in his Guru's Guide books, as a great way to find undocumented features.  That seems to be really good advice! Now if you're a programmer/hacker, you've probably been drooling over the last 5 entries for hekaton (in green), because these are the names of source code files for SQL Server!  Does this mean we can access the source code for SQL Server?  As The Oracle suggested to Neo, can we return to The Source??? Actually, no. Well, maybe a little bit.  While you won't get the actual source code from the compiled DLL and EXE files, you'll get references to source files, debugging symbols, variables and module names, error messages, and even the startup flags for SQL Server.  And if you search for "DBCC" or "CHECKDB" you'll find a really nice section listing all the DBCC commands, including the undocumented ones.  Granted those are pretty easy to find online, but you may be surprised what those web sites DIDN'T tell you! (And neither will I, go look for yourself!)  And as we saw earlier, you'll also find execution plan elements, query processing rules, and who knows what else.  It's also instructive to see how Microsoft organizes their source directories, how various components (storage engine, query processor, Full Text, AlwaysOn/HADR) are split into smaller modules. There are over 2000 source file references, go do some exploring! So what did we learn?  We can pull strings out of executable files, search them for known items, browse them for unknown items, and use the results to examine internal code to learn even more things about SQL Server.  We've even learned how to use command-line utilities!  We are now 1337 h4X0rz!  (Not really.  I hate that leetspeak crap.) Although, I must confess I might've gone too far with the "conspiracy" part of this post.  I apologize for that, it's just my overactive imagination.  There's really no hidden agenda or conspiracy regarding SQL Server internals.  It's not The Matrix.  It's not like you'd find anything like that in there: Attach Matrix Database DM_MATRIX_COMM_PIPELINES MATRIXXACTPARTICIPANTS dm_matrix_agents   Alright, enough of this paranoid ranting!  Microsoft are not really evil!  It's not like they're The Borg from Star Trek: ALTER FEDERATION DROP ALTER FEDERATION SPLIT DROP FEDERATION   #tsql2sday

    Read the article

  • Nice take on Open Source

    - by EmbeddedInsider
    I just revisited the “Micro Framework”- Microsoft’s bootable runtime, essentially an OS that allows managed code to run on small 32bit CPUs, even without Memory Management.  Things are happening http://msdn.microsoft.com/en-us/netframework/bb267253.aspx Abstract The Microsoft .NET Micro Framework is a bootable runtime module that brings the advantages of .NET programming to devices too resource-constrained to run other Microsoft embedded platforms. The benefits of developing with the .NET Micro Framework include the C# programming language, a managed execution environment, a substantial subset of the .NET libraries, and Visual Studio™ deployment and debugging. In this white paper we explain why the .NET Micro Framework is an ideal choice for embedded development and provide technical details of the platform’s Hardware Abstraction Layer (HAL) and Common Language Runtime (CLR). “Micro Framework” is an interesting product, it is very low cost, like zero. And it is largely community controlled under the Apache License.  A partner network is building, and the application environment is .NET. I have been following this for some time, and the community open source approach seems to be working.  There are new features/packages emerging, for example an F# programming language (ARGH! I am still wresting with VB and C#). Anyway, what I found most interesting was a port to Tron.  Tron is a very popular Japanese open source intuitive.  It is a very real time, very compact kernel, and is, like the Micro Framework, ‘free as beer’.  One limit on MF was it was not real time.  But the merger with Tron may eliminate that problem.  Certainly, if I were dealing with a consumer product with quantities in the millions (like a SmartGrid device, or a toy) I would seriously consider something out of this technology pool.

    Read the article

  • Custom Profile Provider with Web Deployment Project

    - by Ben Griswold
    I wrote about implementing a custom profile provider inside of your ASP.NET MVC application yesterday. If you haven’t read the article, don’t sweat it.  Most of the stuff I write is rubbish anyway. Since you have joined me today, though, I might as well offer up a little tip: you can run into trouble, like I did, if you enable your custom profile provider inside of an application which is deployed using a Web Deployment Project.  Everything will run great on your local machine and you’ll probably take an early lunch because you got the code running in no time flat and the build server is happy and all tests pass and, gosh, maybe you’ll just cut out early because it is Friday after all.  But then the first user hits the integration machine and, that’s right, yellow screen of death. Lucky you, just as you’re walking out the door, the user kindly sends the exception message and stack trace: Value cannot be null. Parameter name: type Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code. Stack Trace: [ArgumentNullException: Value cannot be null. Parameter name: type] System.Activator.CreateInstance(Type type, Boolean nonPublic) +2796915 System.Web.Profile.ProfileBase.CreateMyInstance(String username, Boolean isAuthenticated) +76 System.Web.Profile.ProfileBase.Create(String username, Boolean isAuthenticated) +312 User error?  Not this time. Damn! One hour later… you notice the harmless “Treat as library component (remove the App_Code.compiled file)” setting on the Output Assemblies Tab of your Web Deployment Project. You have no idea why, but you uncheck it.  You test and everything works great both locally and on the integration machine.  Application users think you’re the best and you’re still going to catch the last half hour of happy hour.  Happy Friday.

    Read the article

  • Elastic versus Distributed in caching.

    - by Mike Reys
    Until now, I hadn't heard about Elastic Caching yet. Today I read Mike Gualtieri's Blog entry. I immediately thought about Oracle Coherence and got a little scare throughout the reading. Elastic Caching is the next step after Distributed Caching. As we've always positioned Coherence as a Distributed Cache, I thought for a brief instance that Oracle had missed a new trend/technology. But then I started reading the characteristics of an Elastic Cache. Forrester definition: Software infrastructure that provides application developers with data caching services that are distributed across two or more server nodes that consistently perform as volumes grow can be scaled without downtime provide a range of fault-tolerance levels Hey wait a minute, doesn't Coherence fullfill all these requirements? Oh yes, I think it does! The next defintion in the article is about Elastic Application Platforms. This is mainly more of the same with the addition of code execution. Now there is analytics functionality in Oracle Coherence. The analytics capability provides data-centric functions like distributed aggregation, searching and sorting. Coherence also provides continuous querying and event-handling. I think that when it comes to providing an Elastic Application Platform (as in the Forrester definition), Oracle is close, nearly there. And what's more, as Elastic Platform is the next big thing towards the big C word, Oracle Coherence makes you cloud-ready ;-) There you go! Find more info on Oracle Coherence here.

    Read the article

  • SQL SERVER – Plan Cache and Data Cache in Memory

    - by pinaldave
    I get following question almost all the time when I go for consultations or training. I often end up providing the scripts to my clients and attendees. Instead of writing new blog post, today in this single blog post, I am going to cover both the script and going to link to original blog posts where I have mentioned about this blog post. Plan Cache in Memory USE AdventureWorks GO SELECT [text], cp.size_in_bytes, plan_handle FROM sys.dm_exec_cached_plans AS cp CROSS APPLY sys.dm_exec_sql_text(plan_handle) WHERE cp.cacheobjtype = N'Compiled Plan' ORDER BY cp.size_in_bytes DESC GO Further explanation of this script is over here: SQL SERVER – Plan Cache – Retrieve and Remove – A Simple Script Data Cache in Memory USE AdventureWorks GO SELECT COUNT(*) AS cached_pages_count, name AS BaseTableName, IndexName, IndexTypeDesc FROM sys.dm_os_buffer_descriptors AS bd INNER JOIN ( SELECT s_obj.name, s_obj.index_id, s_obj.allocation_unit_id, s_obj.OBJECT_ID, i.name IndexName, i.type_desc IndexTypeDesc FROM ( SELECT OBJECT_NAME(OBJECT_ID) AS name, index_id ,allocation_unit_id, OBJECT_ID FROM sys.allocation_units AS au INNER JOIN sys.partitions AS p ON au.container_id = p.hobt_id AND (au.TYPE = 1 OR au.TYPE = 3) UNION ALL SELECT OBJECT_NAME(OBJECT_ID) AS name, index_id, allocation_unit_id, OBJECT_ID FROM sys.allocation_units AS au INNER JOIN sys.partitions AS p ON au.container_id = p.partition_id AND au.TYPE = 2 ) AS s_obj LEFT JOIN sys.indexes i ON i.index_id = s_obj.index_id AND i.OBJECT_ID = s_obj.OBJECT_ID ) AS obj ON bd.allocation_unit_id = obj.allocation_unit_id WHERE database_id = DB_ID() GROUP BY name, index_id, IndexName, IndexTypeDesc ORDER BY cached_pages_count DESC; GO Further explanation of this script is over here: SQL SERVER – Get Query Plan Along with Query Text and Execution Count Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Optimization, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL Tagged: SQL Memory

    Read the article

  • July, the 31 Days of SQL Server DMO’s – Day 1 (sys.dm_exec_requests)

    - by Tamarick Hill
    The first DMO that I would like to introduce you to is one of the most common and basic DMV’s out there. I use the term DMV because this DMO is actually a view as opposed to a function. This DMV is server-scoped and it returns information about all requests that are currently executing on your SQL Server instance. To illustrate what this DMV returns, lets take a look at the results. As you can see, this DMV returns a wealth of information about requests occurring on your server. You are able to see the SPID, the start time of a request, current status, and the command the SPID is executing. In addition to this you see columns for sql_handle and plan_handle. These columns (when combined with other DMO’s we will discuss later) can return the actual sql text that is being executed on your server as well as the actual execution plan that is cached and being used. This DMV also returns information about various wait types that may be occurring for your spid. The percent_complete column displays a percentage to completion for certain database actions such as DBCC CheckDB, Database Restores, Rollback’s, etc. In addition to these, you are also able to see the amount of reads, writes, and cpu that the SPID has consumed. You will find this DMV to be one of the primary DMV’s that you use when looking for information about what is occurring on your server.

    Read the article

  • Growing Into Enterprise Architecture

    - by pat.shepherd
    I am writing this post as I am in an Enterprise Architecture class, specifically on the Oracle Enterprise Architecture Framework (OEAF).  I have been a long believer that SOA’s key strength is that it is the first IT approach that blends or unifies business and technology.  That is a common view and is certainly valid but is not completely true (or at least accurate).  As my personal view of EA is growing, I realize more than ever that doing EA is FAR MORE than creating a reference architecture, creating a physical architecture or picking a technology to standardize on.  Those are parts of the puzzle but not the whole puzzle by any stretch. I am now a firm believer that the various EA frameworks out there provide the rigor and structure required to allow the bridging of business strategy / vision to IT strategy / vision. The flow goes something like this: Business Strategy –> Business / Application / Information / Technology Architecture –> SOA Reference Architecture –> SOA Functional Architecture.  Governance is imbued throughout to help map, measure and verify the business-to-IT coherence. With those in place, then (and only then) can SOA fulfill it’s potential to be more that an integration strategy, more than a reuse strategy; but also a foundation for tying the results of IT to business vision. Fortunately, EA is a an ongoing process that it is never too late to get started with an understanding of frameworks such as TOGAF, FEA, or OEAF.  Also, EA is never ending in that it always needs to be apply, even once a full-blown Enterprise Architecture is established it needs to be constantly evolved.  For those who are getting deeper into EA as a discipline, there is plenty runway to grow as your company/customer begins to look more seriously at EA. I will close with a pointer to a Great Book I have recently read on this subject: Enterprise Architecture as Strategy (http://www.amazon.com/Enterprise-Architecture-Strategy-Foundation-Execution/dp/1591398398/ref=sr_1_1?ie=UTF8&s=books&qid=1268842865&sr=1-1)

    Read the article

  • Don’t miss the Oracle Webcast: Enabling Effective Decision Making with “One Source of the Truth” at BB&T

    - by Rob Reynolds
    Webcast Date:  September 17th, 2012  -  9 a.m. PT / 12 p.m. ET  BB&T Corporation (NYSE: BBT) is one of the largest financial services holding companies in the United States. One of their IT goals is to provide “one source of truth” to enable more effective decision making at the corporate and local level. By using Oracle’s Hyperion Enterprise Planning Suite and Oracle Essbase, BB&T streamlined their planning and financial reporting processes. Large volumes of data were consolidated into a single reporting solution giving stakeholders more timely and accurate information. By providing a central and automated collaboration tool, BB&T is able to prepare more accurate financial forecasts, rapidly consolidate large amounts of data, and make more informed decisions. Join us on September 17th for a live webcast to hear BB&T’s journey to achieve “One Source of Truth” and learn how Oracle’s Hyperion Planning Suite and Oracle’s Essbase allows you to: Adopt best practices like rolling forecasts and driver-based planning Reduce the time and effort dedicated to the annual budget process Reduce the time and effort dedicated to the annual budget process Remove forecasting uncertainty with predictive modeling capabilities Rapidly analyze shifting market conditions with a powerful calculation engine Prioritize resources effectively with complete visibility into all potential risks Link strategy and execution with integrated strategic, financial and operational planning Register here.

    Read the article

  • WebLogic Weekly for June 20th, 2011

    - by james.bayer
    Welcome the first the first edition of the WebLogic Weekly.  The WebLogic Server team has been trying to extend our community outreach to new mediums like an Oracle WebLogic Youtube Channel (how-to videos and feature showcases), Twitter (sharing WebLogic links, typically blogs), and a Facebook page to do a better job sharing information, providing learning alternatives to product documentation and perhaps most importantly collecting feedback from all of our users using the tools they prefer.  This is our attempt to provide a round-up what has been going on in WebLogic over the past week.  If you would like to have something shared here, use the #weblogic tag on tweets, post on the Oracle WebLogic facebook page, or comment on these blog entries. Blogs WebLogic Server: Listing Groups of an Authenticated User by Steve Button Weblogic, QBrowser And Topics by Eric Elzinga Weblogic, Topics And (Non)-Durable Subscribers by Eric Elzinga Database Web Service using Toplink DB Provider by Vishal Jain WebLogic Server – Use the Execution Context ID in Applications – Lessons From Hansel and Gretel by James Bayer Getting All Server’s Lifecycle State in a Domain by Jay SenSharma Steps to Move Messages From One Queue To Another Queue Using WLST (Updated Version) by Ravish Mody Events If you want to share a story of something innovative you or your organization has done with WebLogic Server or other Fusion Middleware, you could win a pass to Oracle Open World 2011 and share the story there.  See Ruma Sanyal's posting on the Application Grid blog for details.  The deadline for submissions is July 22nd, 2011.

    Read the article

  • Oracle Technology Network Virtual Developer Day: Service Oriented Architecture (SOA)

    - by programmarketingOTN
    Register now! Oracle Technology Network Virtual Developer Day: Service Oriented Architecture (SOA) - Discover the Power of Oracle SOA Suite 11gTuesday July 12, 2011 - ?9:00 a.m. PT – 1:30 p.m. PT / 12 Noon EDT - 4:30 p.m. EDTOTN is proud to host another Virtual Developer Day, this time focusing on SOA (click here to check out on-demand version of Rich Enterprise Applications and WebLogic)  Save yourself/company some money and join us online for this hands-on virtual workshop. Through developer-focused product presentations and demonstrations delivered by Oracle product and technology experts, there is no faster or more efficient way to jumpstart your Oracle SOA suite learning.Over the course of the Virtual Developer Day, you will learn how an SOA approach can be implemented, whether starting fresh with new services or reusing existing services. Using Oracle SOA Suite 11g components, you will explore, modify, execute, and monitor an SOA composite application. Topics include SCA, BPEL process execution, adapters, business rules and more.Java and WebLogic experience not required for the presentations or demonstrations but it is a plus for the hands-on lab.Come to this event if you are    •    Exploring ways to deliver services faster    •    Integrating packaged and/or legacy applications    •    Developing service orchestration    •    Planning or starting new development projectsRegister online now for this FREE event.AGENDA - Tuesday July 12, 2011?9:00 a.m. PT – 1:30 p.m. PT / 12 Noon EDT - 4:30 p.m. PT EDT  Time  Title  9:00 AM Keynote  9:15 AM Presentation 1 Service Oriented Architecture (SOA) Overview  9:45 AM Demonstration 1 Mediator and Adapters  10:15 AM Presentation 2 BPEL Service Orchestration and Business Rules  10:45 AM Demonstration 2 BPEL Service Orchestration  11:15 AM Demonstration 3 Oracle Business Rules  11:45 AM Hands-on Lab time  1:30 PM Close Register online now for this FREE event.

    Read the article

  • SharePoint Apps and Windows Azure

    - by ScottGu
    Last Monday I had an opportunity to present as part of the keynote of this year’s SharePoint Conference.  My segment of the keynote covered the new SharePoint Cloud App Model we are introducing as part of the upcoming SharePoint 2013 and Office 365 releases.  This new app model for SharePoint is additive to the full trust solutions developers write today, and is built around three core tenants: Simplifying the development model and making it consistent between the on-premises version of SharePoint and SharePoint Online provided with Office 365. Making the execution model loosely coupled – and enabling developers to build apps and write code that can run outside of the core SharePoint service. This makes it easy to deploy SharePoint apps using Windows Azure, and avoid having to worry about breaking SharePoint and the apps within it when something is upgraded.  This new loosely coupled model also enables developers to write SharePoint applications that can leverage the full capabilities of the .NET Framework – including ASP.NET Web Forms 4.5, ASP.NET MVC 4, ASP.NET Web API, EF 5, Async, and more. Implementing this loosely coupled model using standard web protocols – like OAuth, JSON, and REST APIs – that enable developers to re-use skills and tools, and easily integrate SharePoint with Web and Mobile application architectures. A video of my talk + demos is now available to watch online: In the talk I walked through building an app from scratch – it showed off how easy it is to build solutions using new SharePoint application, and highlighted a web + workflow + mobile scenario that integrates SharePoint with code hosted on Windows Azure (all built using Visual Studio 2012 and ASP.NET 4.5 – including MVC and Web API). The new SharePoint Cloud App Model is something that I think is pretty exciting, and it is going to make it a lot easier to build SharePoint apps using the full power of both Windows Azure and the .NET Framework.  Using Windows Azure to easily extend SaaS based solutions like Office 365 is also a really natural fit and one that is going to offer a bunch of great developer opportunities.  Hope this helps, Scott  P.S. In addition to blogging, I am also now using Twitter for quick updates and to share links. Follow me at: twitter.com/scottgu

    Read the article

  • Scheduling of jobs in the presence of constraints in Java

    - by Asgard
    I want to know how to implement a solution to this problem: A task is performed by running, by more people, some basic jobs with known duration in time units (days, months, etc..). The execution of the jobs could lead to the existence of time constraints: a job, for example, can not start if it is not over another (or others) and so on. I want to design and build an application to check the correctness of jobs activities and to propose a schedule of jobs, if any, which is respectful of the constraints. Input must provide the jobs and associated constraints. The expected output is the scheduling of jobs. The specification of an elementary job consists of the pair <jobs-id, duration> A constraint is expressed by means of a quintuple of the type <S/E, id-job1, B/A, S/E, id-job2> the beginning (S) or the end (E) of a jobs Id-job1, must take place before (B) / after (A) of the beginning (S) / end (E) of the Id-job2. If there are no dependencies between some jobs, then jobs can be done before, in parallel. As a simple example, consider the input: jobs jobs(0, 3) jobs(1, 4) jobs(2, 5) jobs(3, 3) jobs(4, 3) constraints constraints(S, 1, A, E, 0) constraints(S, 4, A, E, 2) Possible output: t 0 1 2 3 4 0 * - * * - 1 * - * * - 2 * - * * - 3 * - * * - 4 - * * - - 5 - * * - - 6 - * - - * 7 - * - - * 8 - * - - * 9 - - - - * How to code an efficient java scheduler(avoiding the intense backtracking if is possible) to manage the jobs with these constraints, as described??? I have seen a discussion on a thread in a forum where an user seems has solved the problem easily, but He haven't given enough details to the users to compile a working project(I'm noob), and I'm interested to know an effective implementation of the solution (without using external libraries). If someone help me, I'll give to him a very good feedback ;)

    Read the article

  • Getting the alternative to the 200-Line Linux Kernel patch to work

    - by Gödel
    Apparently, there is a comparable alternative to the 200-line kernel patch that involves no kernel upgrade. It is presented here and discussed here. However, I am not sure if webupd8's solution (under the section "Use it in Ubuntu") on Ubuntu actually works or not. In particular, one commenter on ./ is saying he's getting an error message. Could anyone post the "correct" method that actually works? Suggested solution: Based on the comments I've read so far, the following seems to work. (1) In /etc/rc.local, add the following lines to above exit 0: mkdir -p /dev/cgroup/cpu mount -t cgroup cgroup /dev/cgroup/cpu -o cpu mkdir -m 0777 /dev/cgroup/cpu/user echo "/usr/local/sbin/cgroup_clean" > /dev/cgroup/cpu/release_agent (2) Create a file named /usr/local/sbin/cgroup_clean with the following content: #!/bin/sh rmdir /dev/cgroup/cpu/$1 (3) In your ~/.bashrc, add: if [ "$PS1" ] ; then mkdir -m 0700 /dev/cgroup/cpu/user/$$ echo $$ > /dev/cgroup/cpu/user/$$/tasks echo "1" > /dev/cgroup/cpu/user/$$/notify_on_release fi (4) (To make sure the execution bit is on) execute sudo chmod +x /usr/local/sbin/cgroup_clean /etc/rc.local (5) Reboot.

    Read the article

  • Investigation: Can different combinations of components effect Dataflow performance?

    - by jamiet
    Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation!  The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test:     One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category         CSPL Split by Month of Birth and Income Category       Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category         CSPL Split by Month of Birth 1& 2       Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

    Read the article

  • C#/.NET Little Wonders: The Predicate, Comparison, and Converter Generic Delegates

    - by James Michael Hare
    Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. In the last three weeks, we examined the Action family of delegates (and delegates in general), the Func family of delegates, and the EventHandler family of delegates and how they can be used to support generic, reusable algorithms and classes. This week I will be completing my series on the generic delegates in the .NET Framework with a discussion of three more, somewhat less used, generic delegates: Predicate<T>, Comparison<T>, and Converter<TInput, TOutput>. These are older generic delegates that were introduced in .NET 2.0, mostly for use in the Array and List<T> classes.  Though older, it’s good to have an understanding of them and their intended purpose.  In addition, you can feel free to use them yourself, though obviously you can also use the equivalents from the Func family of delegates instead. Predicate<T> – delegate for determining matches The Predicate<T> delegate was a very early delegate developed in the .NET 2.0 Framework to determine if an item was a match for some condition in a List<T> or T[].  The methods that tend to use the Predicate<T> include: Find(), FindAll(), FindLast() Uses the Predicate<T> delegate to finds items, in a list/array of type T, that matches the given predicate. FindIndex(), FindLastIndex() Uses the Predicate<T> delegate to find the index of an item, of in a list/array of type T, that matches the given predicate. The signature of the Predicate<T> delegate (ignoring variance for the moment) is: 1: public delegate bool Predicate<T>(T obj); So, this is a delegate type that supports any method taking an item of type T and returning bool.  In addition, there is a semantic understanding that this predicate is supposed to be examining the item supplied to see if it matches a given criteria. 1: // finds first even number (2) 2: var firstEven = Array.Find(numbers, n => (n % 2) == 0); 3:  4: // finds all odd numbers (1, 3, 5, 7, 9) 5: var allEvens = Array.FindAll(numbers, n => (n % 2) == 1); 6:  7: // find index of first multiple of 5 (4) 8: var firstFiveMultiplePos = Array.FindIndex(numbers, n => (n % 5) == 0); This delegate has typically been succeeded in LINQ by the more general Func family, so that Predicate<T> and Func<T, bool> are logically identical.  Strictly speaking, though, they are different types, so a delegate reference of type Predicate<T> cannot be directly assigned to a delegate reference of type Func<T, bool>, though the same method can be assigned to both. 1: // SUCCESS: the same lambda can be assigned to either 2: Predicate<DateTime> isSameDayPred = dt => dt.Date == DateTime.Today; 3: Func<DateTime, bool> isSameDayFunc = dt => dt.Date == DateTime.Today; 4:  5: // ERROR: once they are assigned to a delegate type, they are strongly 6: // typed and cannot be directly assigned to other delegate types. 7: isSameDayPred = isSameDayFunc; When you assign a method to a delegate, all that is required is that the signature matches.  This is why the same method can be assigned to either delegate type since their signatures are the same.  However, once the method has been assigned to a delegate type, it is now a strongly-typed reference to that delegate type, and it cannot be assigned to a different delegate type (beyond the bounds of variance depending on Framework version, of course). Comparison<T> – delegate for determining order Just as the Predicate<T> generic delegate was birthed to give Array and List<T> the ability to perform type-safe matching, the Comparison<T> was birthed to give them the ability to perform type-safe ordering. The Comparison<T> is used in Array and List<T> for: Sort() A form of the Sort() method that takes a comparison delegate; this is an alternate way to custom sort a list/array from having to define custom IComparer<T> classes. The signature for the Comparison<T> delegate looks like (without variance): 1: public delegate int Comparison<T>(T lhs, T rhs); The goal of this delegate is to compare the left-hand-side to the right-hand-side and return a negative number if the lhs < rhs, zero if they are equal, and a positive number if the lhs > rhs.  Generally speaking, null is considered to be the smallest value of any reference type, so null should always be less than non-null, and two null values should be considered equal. In most sort/ordering methods, you must specify an IComparer<T> if you want to do custom sorting/ordering.  The Array and List<T> types, however, also allow for an alternative Comparison<T> delegate to be used instead, essentially, this lets you perform the custom sort without having to have the custom IComparer<T> class defined. It should be noted, however, that the LINQ OrderBy(), and ThenBy() family of methods do not support the Comparison<T> delegate (though one could easily add their own extension methods to create one, or create an IComparer() factory class that generates one from a Comparison<T>). So, given this delegate, we could use it to perform easy sorts on an Array or List<T> based on custom fields.  Say for example we have a data class called Employee with some basic employee information: 1: public sealed class Employee 2: { 3: public string Name { get; set; } 4: public int Id { get; set; } 5: public double Salary { get; set; } 6: } And say we had a List<Employee> that contained data, such as: 1: var employees = new List<Employee> 2: { 3: new Employee { Name = "John Smith", Id = 2, Salary = 37000.0 }, 4: new Employee { Name = "Jane Doe", Id = 1, Salary = 57000.0 }, 5: new Employee { Name = "John Doe", Id = 5, Salary = 60000.0 }, 6: new Employee { Name = "Jane Smith", Id = 3, Salary = 59000.0 } 7: }; Now, using the Comparison<T> delegate form of Sort() on the List<Employee>, we can sort our list many ways: 1: // sort based on employee ID 2: employees.Sort((lhs, rhs) => Comparer<int>.Default.Compare(lhs.Id, rhs.Id)); 3:  4: // sort based on employee name 5: employees.Sort((lhs, rhs) => string.Compare(lhs.Name, rhs.Name)); 6:  7: // sort based on salary, descending (note switched lhs/rhs order for descending) 8: employees.Sort((lhs, rhs) => Comparer<double>.Default.Compare(rhs.Salary, lhs.Salary)); So again, you could use this older delegate, which has a lot of logical meaning to it’s name, or use a generic delegate such as Func<T, T, int> to implement the same sort of behavior.  All this said, one of the reasons, in my opinion, that Comparison<T> isn’t used too often is that it tends to need complex lambdas, and the LINQ ability to order based on projections is much easier to use, though the Array and List<T> sorts tend to be more efficient if you want to perform in-place ordering. Converter<TInput, TOutput> – delegate to convert elements The Converter<TInput, TOutput> delegate is used by the Array and List<T> delegate to specify how to convert elements from an array/list of one type (TInput) to another type (TOutput).  It is used in an array/list for: ConvertAll() Converts all elements from a List<TInput> / TInput[] to a new List<TOutput> / TOutput[]. The delegate signature for Converter<TInput, TOutput> is very straightforward (ignoring variance): 1: public delegate TOutput Converter<TInput, TOutput>(TInput input); So, this delegate’s job is to taken an input item (of type TInput) and convert it to a return result (of type TOutput).  Again, this is logically equivalent to a newer Func delegate with a signature of Func<TInput, TOutput>.  In fact, the latter is how the LINQ conversion methods are defined. So, we could use the ConvertAll() syntax to convert a List<T> or T[] to different types, such as: 1: // get a list of just employee IDs 2: var empIds = employees.ConvertAll(emp => emp.Id); 3:  4: // get a list of all emp salaries, as int instead of double: 5: var empSalaries = employees.ConvertAll(emp => (int)emp.Salary); Note that the expressions above are logically equivalent to using LINQ’s Select() method, which gives you a lot more power: 1: // get a list of just employee IDs 2: var empIds = employees.Select(emp => emp.Id).ToList(); 3:  4: // get a list of all emp salaries, as int instead of double: 5: var empSalaries = employees.Select(emp => (int)emp.Salary).ToList(); The only difference with using LINQ is that many of the methods (including Select()) are deferred execution, which means that often times they will not perform the conversion for an item until it is requested.  This has both pros and cons in that you gain the benefit of not performing work until it is actually needed, but on the flip side if you want the results now, there is overhead in the behind-the-scenes work that support deferred execution (it’s supported by the yield return / yield break keywords in C# which define iterators that maintain current state information). In general, the new LINQ syntax is preferred, but the older Array and List<T> ConvertAll() methods are still around, as is the Converter<TInput, TOutput> delegate. Sidebar: Variance support update in .NET 4.0 Just like our descriptions of Func and Action, these three early generic delegates also support more variance in assignment as of .NET 4.0.  Their new signatures are: 1: // comparison is contravariant on type being compared 2: public delegate int Comparison<in T>(T lhs, T rhs); 3:  4: // converter is contravariant on input and covariant on output 5: public delegate TOutput Contravariant<in TInput, out TOutput>(TInput input); 6:  7: // predicate is contravariant on input 8: public delegate bool Predicate<in T>(T obj); Thus these delegates can now be assigned to delegates allowing for contravariance (going to a more derived type) or covariance (going to a less derived type) based on whether the parameters are input or output, respectively. Summary Today, we wrapped up our generic delegates discussion by looking at three lesser-used delegates: Predicate<T>, Comparison<T>, and Converter<TInput, TOutput>.  All three of these tend to be replaced by their more generic Func equivalents in LINQ, but that doesn’t mean you shouldn’t understand what they do or can’t use them for your own code, as they do contain semantic meanings in their names that sometimes get lost in the more generic Func name.   Tweet Technorati Tags: C#,CSharp,.NET,Little Wonders,delegates,generics,Predicate,Converter,Comparison

    Read the article

  • C#/.NET &ndash; Finding an Item&rsquo;s Index in IEnumerable&lt;T&gt;

    - by James Michael Hare
    Sorry for the long blogging hiatus.  First it was, of course, the holidays hustle and bustle, then my brother and his wife gave birth to their son, so I’ve been away from my blogging for two weeks. Background: Finding an item’s index in List<T> is easy… Many times in our day to day programming activities, we want to find the index of an item in a collection.  Now, if we have a List<T> and we’re looking for the item itself this is trivial: 1: // assume have a list of ints: 2: var list = new List<int> { 1, 13, 42, 64, 121, 77, 5, 99, 132 }; 3:  4: // can find the exact item using IndexOf() 5: var pos = list.IndexOf(64); This will return the position of the item if it’s found, or –1 if not.  It’s easy to see how this works for primitive types where equality is well defined.  For complex types, however, it will attempt to compare them using EqualityComparer<T>.Default which, in a nutshell, relies on the object’s Equals() method. So what if we want to search for a condition instead of equality?  That’s also easy in a List<T> with the FindIndex() method: 1: // assume have a list of ints: 2: var list = new List<int> { 1, 13, 42, 64, 121, 77, 5, 99, 132 }; 3:  4: // finds index of first even number or -1 if not found. 5: var pos = list.FindIndex(i => i % 2 == 0);   Problem: Finding an item’s index in IEnumerable<T> is not so easy... This is all well and good for lists, but what if we want to do the same thing for IEnumerable<T>?  A collection of IEnumerable<T> has no indexing, so there’s no direct method to find an item’s index.  LINQ, as powerful as it is, gives us many tools to get us this information, but not in one step.  As with almost any problem involving collections, there are several ways to accomplish the same goal.  And once again as with almost any problem involving collections, the choice of the solution somewhat depends on the situation. So let’s look at a few possible alternatives.  I’m going to express each of these as extension methods for simplicity and consistency. Solution: The TakeWhile() and Count() combo One of the things you can do is to perform a TakeWhile() on the list as long as your find condition is not true, and then do a Count() of the items it took.  The only downside to this method is that if the item is not in the list, the index will be the full Count() of items, and not –1.  So if you don’t know the size of the list beforehand, this can be confusing. 1: // a collection of extra extension methods off IEnumerable<T> 2: public static class EnumerableExtensions 3: { 4: // Finds an item in the collection, similar to List<T>.FindIndex() 5: public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder) 6: { 7: // note if item not found, result is length and not -1! 8: return list.TakeWhile(i => !finder(i)).Count(); 9: } 10: } Personally, I don’t like switching the paradigm of not found away from –1, so this is one of my least favorites.  Solution: Select with index Many people don’t realize that there is an alternative form of the LINQ Select() method that will provide you an index of the item being selected: 1: list.Select( (item,index) => do something here with the item and/or index... ) This can come in handy, but must be treated with care.  This is because the index provided is only as pertains to the result of previous operations (if any).  For example: 1: // assume have a list of ints: 2: var list = new List<int> { 1, 13, 42, 64, 121, 77, 5, 99, 132 }; 3:  4: // you'd hope this would give you the indexes of the even numbers 5: // which would be 2, 3, 8, but in reality it gives you 0, 1, 2 6: list.Where(item => item % 2 == 0).Select((item,index) => index); The reason the example gives you the collection { 0, 1, 2 } is because the where clause passes over any items that are odd, and therefore only the even items are given to the select and only they are given indexes. Conversely, we can’t select the index and then test the item in a Where() clause, because then the Where() clause would be operating on the index and not the item! So, what we have to do is to select the item and index and put them together in an anonymous type.  It looks ugly, but it works: 1: // extensions defined on IEnumerable<T> 2: public static class EnumerableExtensions 3: { 4: // finds an item in a collection, similar to List<T>.FindIndex() 5: public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder) 6: { 7: // if you don't name the anonymous properties they are the variable names 8: return list.Select((item, index) => new { item, index }) 9: .Where(p => finder(p.item)) 10: .Select(p => p.index + 1) 11: .FirstOrDefault() - 1; 12: } 13: }     So let’s look at this, because i know it’s convoluted: First Select() joins the items and their indexes into an anonymous type. Where() filters that list to only the ones matching the predicate. Second Select() picks the index of the matches and adds 1 – this is to distinguish between not found and first item. FirstOrDefault() returns the first item found from the previous clauses or default (zero) if not found. Subtract one so that not found (zero) will be –1, and first item (one) will be zero. The bad thing is, this is ugly as hell and creates anonymous objects for each item tested until it finds the match.  This concerns me a bit but we’ll defer judgment until compare the relative performances below. Solution: Convert ToList() and use FindIndex() This solution is easy enough.  We know any IEnumerable<T> can be converted to List<T> using the LINQ extension method ToList(), so we can easily convert the collection to a list and then just use the FindIndex() method baked into List<T>. 1: // a collection of extension methods for IEnumerable<T> 2: public static class EnumerableExtensions 3: { 4: // find the index of an item in the collection similar to List<T>.FindIndex() 5: public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder) 6: { 7: return list.ToList().FindIndex(finder); 8: } 9: } This solution is simplicity itself!  It is very concise and elegant and you need not worry about anyone misinterpreting what it’s trying to do (as opposed to the more convoluted LINQ methods above). But the main thing I’m concerned about here is the performance hit to allocate the List<T> in the ToList() call, but once again we’ll explore that in a second. Solution: Roll your own FindIndex() for IEnumerable<T> Of course, you can always roll your own FindIndex() method for IEnumerable<T>.  It would be a very simple for loop which scans for the item and counts as it goes.  There’s many ways to do this, but one such way might look like: 1: // extension methods for IEnumerable<T> 2: public static class EnumerableExtensions 3: { 4: // Finds an item matching a predicate in the enumeration, much like List<T>.FindIndex() 5: public static int FindIndex<T>(this IEnumerable<T> list, Predicate<T> finder) 6: { 7: int index = 0; 8: foreach (var item in list) 9: { 10: if (finder(item)) 11: { 12: return index; 13: } 14:  15: index++; 16: } 17:  18: return -1; 19: } 20: } Well, it’s not quite simplicity, and those less familiar with LINQ may prefer it since it doesn’t include all of the lambdas and behind the scenes iterators that come with deferred execution.  But does having this long, blown out method really gain us much in performance? Comparison of Proposed Solutions So we’ve now seen four solutions, let’s analyze their collective performance.  I took each of the four methods described above and run them over 100,000 iterations of lists of size 10, 100, 1000, and 10000 and here’s the performance results.  Then I looked for targets at the begining of the list (best case), middle of the list (the average case) and not in the list (worst case as must scan all of the list). Each of the times below is the average time in milliseconds for one execution as computer over the 100,000 iterations: Searches Matching First Item (Best Case)   10 100 1000 10000 TakeWhile 0.0003 0.0003 0.0003 0.0003 Select 0.0005 0.0005 0.0005 0.0005 ToList 0.0002 0.0003 0.0013 0.0121 Manual 0.0001 0.0001 0.0001 0.0001   Searches Matching Middle Item (Average Case)   10 100 1000 10000 TakeWhile 0.0004 0.0020 0.0191 0.1889 Select 0.0008 0.0042 0.0387 0.3802 ToList 0.0002 0.0007 0.0057 0.0562 Manual 0.0002 0.0013 0.0129 0.1255   Searches Where Not Found (Worst Case)   10 100 1000 10000 TakeWhile 0.0006 0.0039 0.0381 0.3770 Select 0.0012 0.0081 0.0758 0.7583 ToList 0.0002 0.0012 0.0100 0.0996 Manual 0.0003 0.0026 0.0253 0.2514   Notice something interesting here, you’d think the “roll your own” loop would be the most efficient, but it only wins when the item is first (or very close to it) regardless of list size.  In almost all other cases though and in particular the average case and worst case, the ToList()/FindIndex() combo wins for performance, even though it is creating some temporary memory to hold the List<T>.  If you examine the algorithm, the reason why is most likely because once it’s in a ToList() form, internally FindIndex() scans the internal array which is much more efficient to iterate over.  Thus, it takes a one time performance hit (not including any GC impact) to create the List<T> but after that the performance is much better. Summary If you’re concerned about too many throw-away objects, you can always roll your own FindIndex() method, but for sheer simplicity and overall performance, using the ToList()/FindIndex() combo performs best on nearly all list sizes in the average and worst cases.    Technorati Tags: C#,.NET,Litte Wonders,BlackRabbitCoder,Software,LINQ,List

    Read the article

< Previous Page | 71 72 73 74 75 76 77 78 79 80 81 82  | Next Page >