Search Results

Search found 40741 results on 1630 pages for 'technology java se'.

Page 269/1630 | < Previous Page | 265 266 267 268 269 270 271 272 273 274 275 276 | Next Page >

Parsing flat files using SSIS : SSIS Nugget

- by jamiet

Often when using SQL Server Integration Services (SSIS) you will find there is more than one way of accomplishing a task and that the most obvious method of doing so might not be the optimal one. In the video below I demonstrate this by way of an experiment using SSIS’s Flat File Source component; I show different ways that you can pull data from a flat file into the SSIS dataflow and also how the nature of the data itself can influence your choice as to how this task should be accomplished. If you are having trouble viewing the video in your blog reader then head to http://sqlblog.com/blogs/jamie_thomson/archive/2010/03/25/parsing-flat-files-using-ssis-ssis-nugget.aspx to see it as it is hosted on my blog! The main point I want to get across from this video is that a little bit of creative thinking when building your dataflows can sometimes be very beneficial for performance; quite often building a solution that isn’t the most obvious might actually turn out to be the best one. You’ll notice, if you have watched the video, that my editing skills weren’t quite up to snuff and I cut off the final few words however all I was saying was that if you have any feedback on this video then I would love to hear it either via email or preferably the comments section below. I hope this turns out to be useful to some of you. @Jamiet P.S. Incidentally the parsing that we do using SSIS expressions in the video would be much easier if we had a TOKENISE function in SSIS’s expression language and I have asked for the introduction of such a function on Connect at [SSIS] TOKEN(string, tokeniser_string, occurence) function. Feel free to go and vote that up if you think this feature would be useful! Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Introducing SSIS Reporting Pack for SQL Server code-named Denali

- by jamiet

In recent blog posts I have introduced the new SSIS Catalog that is forthcoming in SQL Server Code-named Denali: What's new in SSIS in Denali Introduction to SSIS Projects in Denali Parameters in SSIS In Denali SSIS Server, Catalogs, Environments and Environment Variables in SSIS in Denali The SSIS Catalog is responsible for executing SSIS packages and also for capturing the metadata from those executions. However, at the time of writing there is no mechanism provided to view analyse and drill into that metadata and that is the reason that I am, in this blog post, introducing a suite of SSIS Catalog reports called the SSIS Reporting Pack which you can download from my SkyDrive at http://cid-550f681dad532637.office.live.com/self.aspx/Public/SSIS%20Reporting%20Pack/SSISReportingPack%20v0.1.zip. In this first release the SSIS Reporting Pack includes five reports: Catalog – A high-level summary of all activity in the Catalog Folders – A summary of activity in each Catalog Folder Folder – Project-level activity per single Folder Executions – A visualisation of all executions per Folder/Project/Package/Environment or subset thereof Execution – Information about an individual execution Here is a screenshot of the Executions report: Notice that the SSIS Reporting Pack provides a visual overview of all executions in the Catalog. Each execution is represented as a bar on the bar chart, the success or otherwise of each execution is indicated by the colour of the bar and the execution time is indicated by the bar height. I have recorded a video that gives an overview of the SSIS Reporting which I have embedded below. If you are having any trouble viewing the video go see it at http://vimeo.com/17617974 I must stress that this is a very early version of the SSIS Reporting Pack and I am expecting it to change a lot over the coming year. I am very keen to get some feedback about this, specifically: let me know if anything does not work as you expect give me your feature requests The easiest way to get hold of of me for now is within the comments section of this blog post. That’s all for now. I hope the SSIS Reporting Pack proves useful and I look forward to hearing your feedback. Lastly, that download link again: http://cid-550f681dad532637.office.live.com/self.aspx/Public/SSIS%20Reporting%20Pack/SSISReportingPack%20v0.1.zip. @jamiet

Read the article
FileNameColumnName property, Flat File Source Adapter : SSIS Nugget

- by jamiet

I saw a question on MSDN’s SSIS forum the other day that went something like this: I’m loading data into a table from a flat file but I want to be able to store the name of that file as well. Is there a way of doing that? I don’t want to come across as disrespecting those who took the time to reply but there was a few answers along the lines of “loop over the files using a For Each, store the file name in a variable yadda yadda yadda” when in fact there is a much much simpler way of accomplishing this; it just happens to be a little hidden away as I shall now explain! The Flat File Source Adapter has a property called FileNameColumnName which for some reason it isn’t exposed through the Flat File Source editor, it is however exposed via the Advanced Properties: You’ll see in the screenshot above that I have set FileNameColumnName=“Filename” (it doesn’t matter what name you use, anything except a non-zero string will work). What this will do is create a new column in our dataflow called “Filename” that contains, unsurprisingly, the name of the file from which the row was sourced. All very simple. This is particularly useful if you are extracting data from multiple files using the MultiFlatFile Connection Manager as it allows you to differentiate between data from each of the files as you can see in the following screenshot: So there you have it, the FileNameColumnName property; a little known secret of SSIS. I hope it proves to be useful to someone out there. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
SSIS Lookup component tuning tips

- by jamiet

Yesterday evening I attended a London meeting of the UK SQL Server User Group at Microsoft’s offices in London Victoria. As usual it was both a fun and informative evening and in particular there seemed to be a few questions arising about tuning the SSIS Lookup component; I rattled off some comments and figured it would be prudent to drop some of them into a dedicated blog post, hence the one you are reading right now. Scene setting A popular pattern in SSIS is to use a Lookup component to determine whether a record in the pipeline already exists in the intended destination table or not and I cover this pattern in my 2006 blog post Checking if a row exists and if it does, has it changed? (note to self: must rewrite that blog post for SSIS2008). Fundamentally the SSIS lookup component (when using FullCache option) sucks some data out of a database and holds it in memory so that it can be compared to data in the pipeline. One of the big benefits of using SSIS dataflows is that they process data one buffer at a time; that means that not all of the data from your source exists in the dataflow at the same time and is why a SSIS dataflow can process data volumes that far exceed the available memory. However, that only applies to data in the pipeline; for reasons that are hopefully obvious ALL of the data in the lookup set must exist in the memory cache for the duration of the dataflow’s execution which means that any memory used by the lookup cache will not be available to be used as a pipeline buffer. Moreover, there’s an obvious correlation between the amount of data in the lookup cache and the time it takes to charge that cache; the more data you have then the longer it will take to charge and the longer you have to wait until the dataflow actually starts to do anything. For these reasons your goal is simple: ensure that the lookup cache contains as little data as possible. General tips Here is a simple tick list you can follow in order to tune your lookups: Use a SQL statement to charge your cache, don’t just pick a table from the dropdown list made available to you. (Read why in SELECT *... or select from a dropdown in an OLE DB Source component?) Only pick the columns that you need, ignore everything else Make the database columns that your cache is populated from as narrow as possible. If a column is defined as VARCHAR(20) then SSIS will allocate 20 bytes for every value in that column – that is a big waste if the actual values are significantly less than 20 characters in length. Do you need DT_WSTR typed columns or will DT_STR suffice? DT_WSTR uses twice the amount of space to hold values that can be stored using a DT_STR so if you can use DT_STR, consider doing so. Same principle goes for the numerical datatypes DT_I2/DT_I4/DT_I8. Only populate the cache with data that you KNOW you will need. In other words, think about your WHERE clause! Thinking outside the box It is tempting to build a large monolithic dataflow that does many things, one of which is a Lookup. Often though you can make better use of your available resources by, well, mixing things up a little and here are a few ideas to get your creative juices flowing: There is no rule that says everything has to happen in a single dataflow. If you have some particularly resource intensive lookups then consider putting that lookup into a dataflow all of its own and using raw files to pass the pipeline data in and out of that dataflow. Know your data. If you think, for example, that the majority of your incoming rows will match with only a small subset of your lookup data then consider chaining multiple lookup components together; the first would use a FullCache containing that data subset and the remaining data that doesn’t find a match could be passed to a second lookup that perhaps uses a NoCache lookup thus negating the need to pull all of that least-used lookup data into memory. Do you need to process all of your incoming data all at once? If you can process different partitions of your data separately then you can partition your lookup cache as well. For example, if you are using a lookup to convert a location into a [LocationId] then why not process your data one region at a time? This will mean your lookup cache only has to contain data for the location that you are currently processing and with the ability of the Lookup in SSIS2008 and beyond to charge the cache using a dynamically built SQL statement you’ll be able to achieve it using the same dataflow and simply loop over it using a ForEach loop. Taking the previous data partitioning idea further … a dataflow can contain more than one data path so why not split your data using a conditional split component and, again, charge your lookup caches with only the data that they need for that partition. Lookups have two uses: to (1) find a matching row from the lookup set and (2) put attributes from that matching row into the pipeline. Ask yourself, do you need to do these two things at the same time? After all once you have the key column(s) from your lookup set then you can use that key to get the rest of attributes further downstream, perhaps even in another dataflow. Are you using the same lookup data set multiple times? If so, consider the file caching option in SSIS 2008 and beyond. Above all, experiment and be creative with different combinations. You may be surprised at what works. Final thoughts If you want to know more about how the Lookup component differs in SSIS2008 from SSIS2005 then I have a dedicated blog post about that at Lookup component gets a makeover. I am on a mini-crusade at the moment to get a BULK MERGE feature into the database engine, the thinking being that if the database engine can quickly merge massive amounts of data in a similar manner to how it can insert massive amounts using BULK INSERT then that’s a lot of work that wouldn’t have to be done in the SSIS pipeline. If you think that is a good idea then go and vote for BULK MERGE on Connect. If you have any other tips to share then please stick them in the comments. Hope this helps! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Dynamic Unpivot : SSIS Nugget

- by jamiet

A question on the SSIS forum earlier today asked: I need to dynamically unpivot some set of columns in my source file. Every month there is one new column and its set of Values. I want to unpivot it without editing my SSIS packages that is deployed Let’s be clear about what we mean by Unpivot. It is a normalisation technique that basically converts columns into rows. By way of example it converts something like this: AccountCode Jan Feb Mar AC1 100.00 150.00 125.00 AC2 45.00 75.50 90.00 into something like this: AccountCode Month Amount AC1 Jan 100.00 AC1 Feb 150.00 AC1 Mar 125.00 AC2 Jan 45.00 AC2 Feb 75.50 AC2 Mar 90.00 The Unpivot transformation in SSIS is perfectly capable of carrying out the operation defined in this example however in the case outlined in the aforementioned forum thread the problem was a little bit different. I interpreted it to mean that the number of columns could change and in that scenario the Unpivot transformation (and indeed the SSIS dataflow in general) is rendered useless because it expects that the number of columns will not change from what is specified at design-time. There is a workaround however. Assuming all of the columns that CAN exist will appear at the end of the rows, we can (1) import all of the columns in the file as just a single column, (2) use a script component to loop over all the values in that “column” and (3) output each one as a column all of its own. Let’s go over that in a bit more detail. I’ve prepared a data file that shows some data that we want to unpivot which shows some customers and their mythical shopping lists (it has column names in the first row): We use a Flat File Connection Manager to specify the format of our data file to SSIS: and a Flat File Source Adapter to put it into the dataflow (no need a for a screenshot of that one – its very basic). Notice that the values that we want to unpivot all exist in a column called [Groceries]. Now onto the script component where the real work goes on, although the code is pretty simple: Here I show a screenshot of this executing along with some data viewers. As you can see we have successfully pulled out all of the values into a row all of their own thus accomplishing the Dynamic Unpivot that the forum poster was after. If you want to run the demo for yourself then I have uploaded the demo package and source file up to my SkyDrive: http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100529/Dynamic%20Unpivot.zip Simply extract the two files into a folder, make sure the Connection Manager is pointing to the file, and execute! Hope this is useful. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
SQL SERVER – Simple Example of Snapshot Isolation – Reduce the Blocking Transactions

- by pinaldave

To learn any technology and move to a more advanced level, it is very important to understand the fundamentals of the subject first. Today, we will be talking about something which has been quite introduced a long time ago but not properly explored when it comes to the isolation level. Snapshot Isolation was introduced in SQL Server in 2005. However, the reality is that there are still many software shops which are using the SQL Server 2000, and therefore cannot be able to maintain the Snapshot Isolation. Many software shops have upgraded to the later version of the SQL Server, but their respective developers have not spend enough time to upgrade themselves with the latest technology. “It works!” is a very common answer of many when they are asked about utilizing the new technology, instead of backward compatibility commands. In one of the recent consultation project, I had same experience when developers have “heard about it” but have no idea about snapshot isolation. They were thinking it is the same as Snapshot Replication – which is plain wrong. This is the same demo I am including here which I have created for them. In Snapshot Isolation, the updated row versions for each transaction are maintained in TempDB. Once a transaction has begun, it ignores all the newer rows inserted or updated in the table. Let us examine this example which shows the simple demonstration. This transaction works on optimistic concurrency model. Since reading a certain transaction does not block writing transaction, it also does not block the reading transaction, which reduced the blocking. First, enable database to work with Snapshot Isolation. Additionally, check the existing values in the table from HumanResources.Shift. ALTER DATABASE AdventureWorks SET ALLOW_SNAPSHOT_ISOLATION ON GO SELECT ModifiedDate FROM HumanResources.Shift GO Now, we will need two different sessions to prove this example. First Session: Set Transaction level isolation to snapshot and begin the transaction. Update the column “ModifiedDate” to today’s date. -- Session 1 SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRAN UPDATE HumanResources.Shift SET ModifiedDate = GETDATE() GO Please note that we have not yet been committed to the transaction. Now, open the second session and run the following “SELECT” statement. Then, check the values of the table. Please pay attention on setting the Isolation level for the second one as “Snapshot” at the same time when we already start the transaction using BEGIN TRAN. -- Session 2 SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRAN SELECT ModifiedDate FROM HumanResources.Shift GO You will notice that the values in the table are still original values. They have not been modified yet. Once again, go back to session 1 and begin the transaction. -- Session 1 COMMIT After that, go back to Session 2 and see the values of the table. -- Session 2 SELECT ModifiedDate FROM HumanResources.Shift GO You will notice that the values are yet not changed and they are still the same old values which were there right in the beginning of the session. Now, let us commit the transaction in the session 2. Once committed, run the same SELECT statement once more and see what the result is. -- Session 2 COMMIT SELECT ModifiedDate FROM HumanResources.Shift GO You will notice that it now reflects the new updated value. I hope that this example is clear enough as it would give you good idea how the Snapshot Isolation level works. There is much more to write about an extra level, READ_COMMITTED_SNAPSHOT, which we will be discussing in another post soon. If you wish to use this transaction’s Isolation level in your production database, I would appreciate your comments about their performance on your servers. I have included here the complete script used in this example for your quick reference. ALTER DATABASE AdventureWorks SET ALLOW_SNAPSHOT_ISOLATION ON GO SELECT ModifiedDate FROM HumanResources.Shift GO -- Session 1 SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRAN UPDATE HumanResources.Shift SET ModifiedDate = GETDATE() GO -- Session 2 SET TRANSACTION ISOLATION LEVEL SNAPSHOT BEGIN TRAN SELECT ModifiedDate FROM HumanResources.Shift GO -- Session 1 COMMIT -- Session 2 SELECT ModifiedDate FROM HumanResources.Shift GO -- Session 2 COMMIT SELECT ModifiedDate FROM HumanResources.Shift GO Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Transaction Isolation

Read the article
FileNameColumnName property, Flat File Source Adapter : SSIS Nugget

- by jamiet

I saw a question on MSDN’s SSIS forum the other day that went something like this: I’m loading data into a table from a flat file but I want to be able to store the name of that file as well. Is there a way of doing that? I don’t want to come across as disrespecting those who took the time to reply but there was a few answers along the lines of “loop over the files using a For Each, store the file name in a variable yadda yadda yadda” when in fact there is a much much simpler way of accomplishing this; it just happens to be a little hidden away as I shall now explain! The Flat File Source Adapter has a property called FileNameColumnName which for some reason it isn’t exposed through the Flat File Source editor, it is however exposed via the Advanced Properties: You’ll see in the screenshot above that I have set FileNameColumnName=“Filename” (it doesn’t matter what name you use, anything except a non-zero string will work). What this will do is create a new column in our dataflow called “Filename” that contains, unsurprisingly, the name of the file from which the row was sourced. All very simple. This is particularly useful if you are extracting data from multiple files using the MultiFlatFile Connection Manager as it allows you to differentiate between data from each of the files as you can see in the following screenshot: So there you have it, the FileNameColumnName property; a little known secret of SSIS. I hope it proves to be useful to someone out there. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Inequality joins, Asynchronous transformations and Lookups : SSIS

- by jamiet

It is pretty much accepted by SQL Server Integration Services (SSIS) developers that synchronous transformations are generally quicker than asynchronous transformations (for a description of synchronous and asynchronous transformations go read Asynchronous and synchronous data flow components). Notice I said “generally” and not “always”; there are circumstances where using asynchronous transformations can be beneficial and in this blog post I’ll demonstrate such a scenario, one that is pretty common when building data warehouses. Imagine I have a [Customer] dimension table that manages information about all of my customers as a slowly-changing dimension. If that is a type 2 slowly changing dimension then you will likely have multiple rows per customer in that table. Furthermore you might also have datetime fields that indicate the effective time period of each member record. Here is such a table that contains data for four dimension members {Terry, Max, Henry, Horace}: Notice that we have multiple records per customer and that the [SCDStartDate] of a record is equivalent to the [SCDEndDate] of the record that preceded it (if there was one). (Note that I am on record as saying I am not a fan of this technique of storing an [SCDEndDate] but for the purposes of clarity I have included it here.) Anyway, the idea here is that we will have some incoming data containing [CustomerName] & [EffectiveDate] and we need to use those values to lookup [Customer].[CustomerId]. The logic will be: Lookup a [CustomerId] WHERE [CustomerName]=[CustomerName] AND [SCDStartDate] <= [EffectiveDate] AND [EffectiveDate] <= [SCDEndDate] The conventional approach to this would be to use a full cached lookup but that isn’t an option here because we are using inequality conditions. The obvious next step then is to use a non-cached lookup which enables us to change the SQL statement to use inequality operators: Let’s take a look at the dataflow: Notice these are all synchronous components. This approach works just fine however it does have the limitation that it has to issue a SQL statement against your lookup set for every row thus we can expect the execution time of our dataflow to increase linearly in line with the number of rows in our dataflow; that’s not good. OK, that’s the obvious method. Let’s now look at a different way of achieving this using an asynchronous Merge Join transform coupled with a Conditional Split. I’ve shown it post-execution so that I can include the row counts which help to illustrate what is going on here: Notice that there are more rows output from our Merge Join component than on the input. That is because we are joining on [CustomerName] and, as we know, we have multiple records per [CustomerName] in our lookup set. Notice also that there are two asynchronous components in here (the Sort and the Merge Join). I have embedded a video below that compares the execution times for each of these two methods. The video is just over 8minutes long. View on Vimeo For those that can’t be bothered watching the video I’ll tell you the results here. The dataflow that used the Lookup transform took 36 seconds whereas the dataflow that used the Merge Join took less than two seconds. An illustration in case it is needed: Pretty conclusive proof that in some scenarios it may be quicker to use an asynchronous component than a synchronous one. Your mileage may of course vary. The scenario outlined here is analogous to performance tuning procedural SQL that uses cursors. It is common to eliminate cursors by converting them to set-based operations and that is effectively what we have done here. Our non-cached lookup is performing a discrete operation for every single row of data, exactly like a cursor does. By eliminating this cursor-in-disguise we have dramatically sped up our dataflow. I hope all of that proves useful. You can download the package that I demonstrated in the video from my SkyDrive at http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100514/20100514%20Lookups%20and%20Merge%20Joins.zip Comments are welcome as always. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Enforce SSIS naming conventions using BI-xPress

- by jamiet

A long long long time ago (in 2006 in fact) I published a blog post entitled Suggested Best Practises and naming conventions in which I suggested a bunch of acronyms that folks could use to prefix object names in their SSIS packages, thus allowing easier identification of those objects in log records, here is a sample of some of those suggestions: If you have adopted these naming conventions (and I am led to believe that a bunch of people have) then you might like to know that you can now check for adherence to these conventions using a tool called BI-xPress from Pragmatic Works. BI-xPress includes a feature called the Best Practices Analyzer that scans your packages and assess them according to some rules that you specify. In addition Pragmatic Works have made available a collection of these rules that adhere to the naming conventions I specified in 2006 You can download this collection however I recommend you first read the accompanying article that demonstrates the capabilities of the Best Practices Analyzer. Pretty cool stuff. @Jamiet

Read the article
Querying the SSIS Catalog? Here’s a handy query!

- by jamiet

I’ve been working on a SQL Server Integration Services (SSIS) solution for about 6 months now and I’ve learnt many many things that I intend to share on this blog just as soon as I get the time. Here’s a very short starter-for-ten… I’ve found the following query to be utterly invaluable when interrogating the SSIS Catalog to discover what is going on in my executions: SELECT event_message_id,MESSAGE,package_name,event_name,message_source_name,package_path,execution_path,message_type,message_source_typeFROM ( SELECT em.* FROM SSISDB.catalog.event_messages em WHERE em.operation_id = (SELECT MAX(execution_id) FROM SSISDB.catalog.executions) AND event_name NOT LIKE '%Validate%' )q/* Put in whatever WHERE predicates you might like*/--WHERE event_name = 'OnError'--WHERE package_name = 'Package.dtsx'--WHERE execution_path LIKE '%<some executable>%'ORDER BY message_time DESC Know it. Learn it. Love it. @jamiet

Read the article
Java Spotlight Episode 113: John Ceccarelli on Netbeans @JCeccarelli1

- by Roger Brinkley

Interview with John Ceccarelli on Netbeans. Right-click or Control-click to download this MP3 file. You can also subscribe to the Java Spotlight Podcast Feed to get the latest podcast automatically. If you use iTunes you can open iTunes and subscribe with this link: Java Spotlight Podcast in iTunes. Show Notes News JCP Star Spec Leads 2012 Nominations open now until 31 December Java EE 7 Survey Results JavaFX for Tablets Survey JavaFX Scene Builder - Developer Preview Release Oracle JDK 7u10 released with new security features jtreg update, December 2012 Food For Tests: 7u12 Build b05, 8 b68 Preview Builds + Builds with Lambda & Type Annotation Support Developer Preview of Java SE 8 (with JavaFX) for ARM Project Nashorn: The Vote Is In Events Dec 20, 9:30am JCP Spec Lead Call December on Developing a TCK Jan 15-16, JCP EC Face to Face Meeting, West Coast USA Jan 14-17, IOUG, Redwood Shores Jan 29-31, Distributech, San Diego Feb 2-3 FOSDEM, Brussels Feb 4-6 Jfokus, Sweden Feature Interview John Jullion-Ceccarelli is the head of engineering for the NetBeans open source project and for the VisualVM Java profiler. John started with Sun Microsystems in 2001 as a technical writer and has since held a variety of positions including technical publications manager, engineering manager, and NetBeans IDE 6.9 Release Boss. He recently relocated to the San Francisco Bay Area after 13 years living in Prague, the Czech Republic. What’s Cool Glassfish is 3 years old Arduino/Raspberry-Pi/JavaFX mash-up by Jose Pereda Early Access of Drombler FX for building modular JavaFX applications with OSGi and Maven Eclipse Modeling Framework Support coming for e(fx)clipse 8003562: Provide a command-line tool to find static dependencies Duke’s Choice Awards Winners LAD - includes JCP EC Member TOTVS London Java Community and SouJava jointly win JCP member of the year

Read the article
SSIS Prehistory video

- by jamiet

I’m currently wasting spending my Easter bank holiday putting together my presentation SSIS Dataflow Performance Tuning for the upcoming SQL Bits conference in London and in doing so I’m researching some old material about how the dataflow actually works. Boring as it is I’ve gotten easily sidelined and have chanced upon an old video on Channel 9 entitled Euan Garden - Tour of SQL Server Team (part I). Euan is a former member of the SQL Server team and in this series of videos he walks the halls of the SQL Server building on Microsoft’s Redmond campus talking to some of the various protagonists and in this one he happens upon the SQL Server Integration Services team. The video was shot in 2004 so this is a fascinating (to me anyway) glimpse into the development of SSIS from before it was ever shipped and if you’re a geek like me you’ll really enjoy this behind-the-scenes look into how and why the product was architected. The video is also notable for the presence of the cameraman – none other than the now-rather-more-famous-than-he-was-then Robert Scoble. See it at http://channel9.msdn.com/posts/TheChannel9Team/Euan-Garden-Tour-of-SQL-Server-Team-part-I/ Enjoy! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
SSIS Catalog, Windows updates and deployment failures due to System.Core mismatch

- by jamiet

This is a heads-up for anyone doing development on SSIS. On my current project where we are implementing a SQL Server Integration Services (SSIS) 2012 solution we recently encountered a situation where we were unable to deploy any of our projects even though we had successfully deployed in the past. Any attempt to use the deployment wizard resulted in this error dialog: The text of the error (for all you search engine crawlers out there) was: A .NET Framework error occurred during execution of user-defined routine or aggregate "create_key_information": System.IO.FileLoadException: Could not load file or assembly 'System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040) ---> System.IO.FileLoadException: The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040) System.IO.FileLoadException: System.IO.FileLoadException: at Microsoft.SqlServer.IntegrationServices.Server.Security.CryptoGraphy.CreateSymmetricKey(String algorithm) at Microsoft.SqlServer.IntegrationServices.Server.Security.CryptoGraphy.CreateKeyInformation(SqlString algorithmName, SqlBytes& key, SqlBytes& IV) . (Microsoft SQL Server, Error: 6522) After some investigation and a bit of back and forth with some very helpful members of the SSIS product team (hey Matt, Wee Hyong) it transpired that this was due to a .Net Framework fix that had been delivered via Windows Update. I took a look at the server update history and indeed there have been some recently applied .Net Framework updates: This fix had (in the words of Matt Masson) “somehow caused a mismatch on System.Core for SQLCLR” and, as you may know, SQLCLR is used heavily within the SSIS Catalog. The fix was pretty simple – restart SQL Server. This causes the assemblies to be upgraded automatically. If you are using Data Quality Services (DQS) you may have experienced similar problems which are documented at Upgrade SQLCLR Assemblies After .NET Framework Update. I am hoping the SSIS team will follow-up with a more thorough explanation on their blog soon. You DBAs out there may be questioning why Windows Update is set to automatically apply updates on our production servers. We’re checking that out with our hosting provider right now You have been warned! @Jamiet

Read the article
Using Find/Replace with regular expressions inside a SSIS package

- by jamiet

Another one of those might-be-useful-again-one-day-so-I’ll-share-it-in-a-blog-post blog posts I am currently working on a SQL Server Integration Services (SSIS) 2012 implementation where each package contains a parameter called ETLIfcHist_ID: During normal execution this will get altered when the package is executed from the Execute Package Task however we want to make sure that at deployment-time they all have a default value of –1. Of course, they tend to get changed during development so I wanted a way of easily changing them all back to the default value. Opening up each package in turn and editing them was an option but given that we have over 40 packages and we might want to carry out this reset fairly frequently I needed a more automated method so I turned to Visual Studio’s Find/Replace… feature Of course, we don’t know what value will be in that parameter so I can’t simply search for a particular value; hence I opted to use a regular expression to identify the value to be change. In the rest of this blog post I’ll explain how to do that. For demonstration purposes I have taken the contents of a .dtsx file and stripped out everything except the element containing the parameters (<DTS:PackageParameters>), if you want to play along at home you can copy-paste the XML document below into a new XML file and open it up in Visual Studio: <?xml version="1.0"?> <DTS:Executable xmlns:DTS="www.microsoft.com/SqlServer/Dts"> <DTS:PackageParameters> <DTS:PackageParameter DTS:CreationName="" DTS:DataType="3" DTS:Description="InterfaceHistory_ID: used for Lineage" DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25846C7E}" DTS:ObjectName="ETLIfcHist_ID"> <DTS:Property DTS:DataType="3" DTS:Name="ParameterValue">VALUE_TO_BE_CHANGED</DTS:Property> </DTS:PackageParameter> <DTS:PackageParameter DTS:CreationName="" DTS:DataType="3" DTS:Description="Some other description" DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25845C7E}" DTS:ObjectName="SomeOtherObjectName"> <DTS:Property DTS:DataType="3" DTS:Name="ParameterValue">SomeOtherValue</DTS:Property> </DTS:PackageParameter> </DTS:PackageParameters> </DTS:Executable> We are trying to identify the value of the parameter whose name is ETLIfcHist_ID – notice that in the XML document above that value is VALUE_TO_BE_CHANGED. The following regular expression will find the appropriate portion of the XML document: {\<DTS\:PackageParameter[\n ]*DTS\:CreationName="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:DataType="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:Description="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:DTSID="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:ObjectName="ETLIfcHist_ID"\>[\n ]*\<DTS\:Property[\n ]*DTS\:DataType="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:Name="ParameterValue"\>}[A-Za-z0-9\:_\{\}- ]*{\<\/DTS\:Property\>} I have highlighted the name of the parameter that we’re looking for. I have also highlighted two portions identified by pairs of curly braces “{…}”; these are important because they pick out the two portions either side of the value I want to replace, in other words the portions highlighted here: <DTS:PackageParameters> <DTS:PackageParameter DTS:CreationName="" DTS:DataType="3" DTS:Description="InterfaceHistory_ID: used for Lineage" DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25846C7E}" DTS:ObjectName="ETLIfcHist_ID"> <DTS:Property DTS:DataType="3" DTS:Name="ParameterValue">VALUE_TO_BE_CHANGED</DTS:Property> </DTS:PackageParameter> Those sections in the curly braces are termed tag expressions and can be identified in the replace expression using a backslash and a number identifying which tag expression you’re referring to according to its ordinal position. Hence, our replace expression is simply: \1-1\2 We’re saying the portion of our file identified by the regular expression should be replaced by the first curly brace section, then the literal –1, then the second curly brace section. Make sense? Give it a go yourself by plugging those two expressions into Visual Studio’s Find and Replace dialog. If you set it to look in “All Open Documents” then you can open up the code-behind of all your packages and change all of them at once. The Find and Replace dialog will look like this: That’s it! I realise that not everyone will be looking to change the value of a parameter but hopefully I have shown you a technique that you can modify to work for your own scenario. Given that this blog post is, y’know, on the web I have no doubt that someone is going to find a fault with my find regex expression and if that person is you….that’s OK. Let me know about it in the comments below and perhaps we can work together to come up with something better! Note that some parameters may have a different set of properties (for example some, but not all, of my parameters have a DTS:Required attribute) so your find regular expression may have to change accordingly. When researching this I found the following article to be invaluable: Visual Studio Find/Replace Regular Expression Usage @Jamiet

Read the article
Investigation: Can different combinations of components effect Dataflow performance?

- by jamiet

Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation! The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test: One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category CSPL Split by Month of Birth and Income Category Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category CSPL Split by Month of Birth 1& 2 Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

Read the article
SSIS Prehistory video

- by jamiet

I’m currently wasting spending my Easter bank holiday putting together my presentation SSIS Dataflow Performance Tuning for the upcoming SQL Bits conference in London and in doing so I’m researching some old material about how the dataflow actually works. Boring as it is I’ve gotten easily sidelined and have chanced upon an old video on Channel 9 entitled Euan Garden - Tour of SQL Server Team (part I). Euan is a former member of the SQL Server team and in this series of videos he walks the halls of the SQL Server building on Microsoft’s Redmond campus talking to some of the various protagonists and in this one he happens upon the SQL Server Integration Services team. The video was shot in 2004 so this is a fascinating (to me anyway) glimpse into the development of SSIS from before it was ever shipped and if you’re a geek like me you’ll really enjoy this behind-the-scenes look into how and why the product was architected. The video is also notable for the presence of the cameraman – none other than the now-rather-more-famous-than-he-was-then Robert Scoble. See it at http://channel9.msdn.com/posts/TheChannel9Team/Euan-Garden-Tour-of-SQL-Server-Team-part-I/ Enjoy! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
The Work Order Printing Challenge

- by celine.beck

One of the biggest concerns we've heard from maintenance practitioners is the ability to print and batch print work order details along with its accompanying attachments. Indeed, maintenance workers traditionally rely on work order packets to complete their job. A standard work order packet can include a variety of information like equipment documentation, operating instructions, checklists, end-of-task feedback forms and the likes. Now, the problem is that most Asset Lifecycle Management applications do not provide a simple and efficient solution for process printing with document attachments. Work order forms can be easily printed but attachments are usually left out of the printing process. This sounds like a minor problem, but when you are processing high volume of work orders on a regular basis, this inconvenience can result in important inefficiencies. In order to print work order and its related attachments, maintenance personnel need to print the work order details and then go back to the work order and open each individual attachment using the proper authoring application to view and print each document. The printed output is collated into a work order packet. The AutoVue Document Print Service products that were just released in April 2010 aim at helping organizations address the work order printing challenge. Customers and partners can leverage the AutoVue Document Print Services to build a complete printing solution that complements their existing print server solution with AutoVue's document- and platform-agnostic document print services. The idea is to leverage AutoVue's printing services to invoke printing either programmatically or manually directly from within the work order management application, and efficiently process the printing of complete work order packets, including all types of attachments, from office files to more advanced engineering documents like 2D CAD drawings. Oracle partners like MIPRO Consulting, specialists in PeopleSoft implementations, have already expressed interest in the AutoVue Document Print Service products for their ability to offer print services to the PeopleSoft ALM suite, so that customers are able to print packages of documents for maintenance personnel. For more information on the subject, please consult MIPRO Consulting's article entitled Unsung Value: Primavera and AutoVue Integration into PeopleSoft posted on their blog. The blog post entitled Introducing AutoVue Document Print Service provides additional information on how the solution works. We would also love to hear what your thoughts are on the topic, so please do not hesitate to post your comments/feedback on our blog. Related Articles: Introducing AutoVue Document Print Service Print Any Document Type with AutoVue Document Print Services

Read the article
C# via Java: Introduction

- by simonc

Originally posted on: http://geekswithblogs.net/simonc/archive/2013/11/08/c-via-java-introduction.aspxSo, I've recently changed jobs. Rather than working in .NET land, I've migrated over to Java land. But never fear! I'll continue to peer under the covers of .NET, but my next series will use my new experience in Java to explore the design decisions made in the development of the C# programming language. After all, the design of C# was based on Java 1.2, and both languages have continued to evolve since then, incorporating modern software engineering concepts and requirements. Exploring the differences and similarities between the two will (hopefully) give us a deeper understanding into why .NET is implemented the way it is, the trade-offs involved, and what choices were made when new features were designed and added to the language and framework. Among others, I'll be looking at differences in: Primitives Operators Generics Exceptions Accessibility Collections Delegates and inner classes Concurrency In my next post, I'll start off by looking at the type primitives available in each language, and how Java and C# actually incorporate two different concepts of primitive types in their fundamental language design and use. I'm also thinking of looking at the inner details of Java and the JVM in my blogs, as well as C# and the CLR. If you've got any comments or thoughts on this, please let me know.

Read the article
Java, the Cloud, and Oracle at QCon San Francisco 2011

- by Bob Rhubart

If you're part of the lucky bunch attending this week's sold-out QCon San Francisco conference at Westin San Francisco Market Street, I'd like to bring several sessions to your attention. On Wednesday Nov 16, Alex Buckley, specification lead for the Java Language and the Java Virtual Machine at Oracle, will present Java 7 and 8: Where We've Been, Where We're Going, part of the Why is Java still sexy? track. The session begins at 10:35 a.m. in the Olympic room. On Thursday Nov 17, Tyler Jewell, VP Product Management for Oracle's Platform as a Service, will participate in the Performance and Scalability Panel moderated by InfoQ founder and QCon SF Program Committee Member Floyd Marinescu. That panel, part of the Performance and Scalability Solutions track, begins at 10:35 a.m. in the Olympic room. Following that panel discussion, Tyler will fly solo with a presentation on Java EE 7: Developing for the Cloud, also part of the Performance and Scalability Solutions track. That session kicks off at 12:05 p.m., also in the Olympic room. On Friday Nov 18 Tyler will jump tracks, so to speak, when he presents The Architecture of Oracle's Public Cloud, part of the Architecture Case Studies: Cloud track. That session begins at 4:50 p.m. in the Stanford room. Of course, QCon also offers ample meet-and-greet opportunities. One such opportunity happens in the hospitality suite hosted by the Java Community Process Executive Committee. That shindig gets in gear at 5:50 pm on Thursday. Throughout the QCon San Francisco conference, members of the OTN team (including your's truly) and members of the Oracle Fusion Middleware team will be on hand at the OTN booth in the conference lobby. Stop by to say hello, score some swag, and catch a demo or two.

Read the article
Code formatter for SSMS

- by blakmk

I was searching recently for a code formatter for T-Sql and I came accross this nice little utility that I wanted to share: http://www.wangz.net/cgi-bin/pp/gsqlparser/sqlpp/sqlformat.tpl I've been dealing with a lot of legacy code latley and there is nothing I find more infuriating than unformatted code. This tool seems to work quite well. Just one click and it formats everything nicely. There is also a free web version. This Web Page Created with PageBreeze Free HTML Editor

Read the article
Have SSIS' differing type systems ever caused you problems?

- by jamiet

One thing that has always infuriated me about SSIS is the fact that every package has three different type systems; to give you an idea of what I am talking about consider the following: The SSIS dataflow's type system is made up of types called DT_* (e.g. DT_STR, DT_I4) The SSIS variable type system is based on .Net datatypes (e.g. String, Int32) The types available for Execute SQL Task's parameters are based on something else - I don't exactly know what (e.g. VARCHAR, LONG) Speaking euphemistically ... this is not an optimum situation (were I not speaking euphemistically I would be a lot ruder) and hence I have submitted a suggestion to Connect at [SSIS] Consolidate three type systems into one requesting that it be remedied. This accompanying blog post is not however a request for votes (though that would be nice); the reason is actually subtler than that. Let me explain. I have been submitting bugs and suggestions pertaining to SSIS for years and have, so far, submitted over 200 Connect items. If that experience has taught me anything it is this - Connect items are not generally actioned because they are considered "nice to have". No, SSIS Connect items get actioned because they cause customers grief and if I am perfectly honest I must admit that, other than being a bit gnarly, SSIS' three type system architecture has never knowingly caused me any significant problems. The reason for this blog post is to ask if any reader out there has ever encountered any problems on account of SSIS' three type systems or have you, like me, never found them to be a problem? Errors or performance degredation caused by implicit type conversions would, I believe, present a strong case for getting this situation remedied in a future version of SSIS so if you HAVE encountered such problems I would encourage you to leave a comment on the Connect submission accordingly. Let me know in the comments too - I would be interested to hear others' opinions on this. @Jamiet

Read the article
Tab Sweep: Java EE 6 Scopes, Observer, SSL, Workshop, Virtual Server, JDBC Connection Validation

- by arungupta

Recent Tips and News on Java, Java EE 6, GlassFish & more : • How Java EE 6 Scopes Affect User Interactions (DevX.com) • Why is Java EE 6 better than Spring ? (Arun Gupta) • JavaEE Revisits Design Patterns: Observer (Murat Yener) • Getting started with Glassfish V3 and SSL (JavaDude) • Software stacks market share within Jelastic: March 2012 (Jelastic) • All aboard the Java EE 6 Love Boat! (Bert Ertman) • Full stack Java EE workshop (Kito Mann) • Create a virtual server from console in glassfish (Hector Guzman) • Glassfish – JDBC Connection Validation explained (Alexandru Ersenie) • Automatically setting the label of a component in JSF 2 (Arjan Tijms) • JSF2 + Primefaces3 + Spring3 & Hibernate4 Integration Project (Eren Avsarogullari) • THE EXECUTABLE FEEL OF JAX-RS 2.0 CLIENT (Adam Bien) Here are some tweets from this week ... web-app dtd(s) on http://t.co/4AN0057b R.I.P. using http://t.co/OTZrOEEr instead. Thank you Oracle! finally got GlassFish and Cassandra running embedded so I can unit test my app #jarhell #JavaEE6 + #NetBeans is really a pleasure to work with! Reading latest chapter in #Spring vs #JavaEE wars https://t.co/RqlGmBG9 (and yes, #JavaEE6 is better :P) @javarebel very easy install and very easy to use in combination with @netbeans and @glassfish. Save your time.

Read the article
Merge Join component sorted outputs [SSIS]

- by jamiet

One question that I have been asked a few times of late in regard to performance tuning SSIS data flows is this: Why isn’t the Merge Join output sorted (i.e.IsSorted=True)? This is a fair question. After all both of the Merge Join inputs are sorted, hence why wouldn’t the output be sorted as well? Well here’s a little secret, the Merge Join output IS sorted! There’s a caveat though – it is only under certain circumstances and SSIS itself doesn’t do a good job of informing you of it. Let’s take a look at an example. Here we have a dataflow that consumes data from the [AdventureWorks2008].[Sales].[SalesOrderHeader] & [AdventureWorks2008].[Sales].[SalesOrderDetail] tables then joins them using a Merge Join component: Let’s take a look inside the editor of the Merge Join: We are joining on the [SalesOrderId] field (which is what the two inputs just happen to be sorted upon). We are also putting [SalesOrderHeader].[SalesOrderId] into the output. Believe it or not the output from this Merge Join component is sorted (i.e. has IsSorted=True) but unfortunately the Merge Join component does not have an Advanced Editor hence it is hidden away from us. There are a couple of ways to prove to you that is the case; I could open up the package XML inside the .dtsx file and show you the metadata but there is an easier way than that – I can attach a Sort component to the output. Take a look: Notice that the Sort component is attempting to sort on the [SalesOrderId] column. This gives us the following warning: Validation warning. DFT Get raw data: {992B7C9A-35AD-47B9-A0B0-637F7DDF93EB}: The data is already sorted as specified so the transform can be removed. The warning proves that the output from the Merge Join is sorted! It must be noted that the Merge Join output will only have IsSorted=True if at least one of the join columns is included in the output. So there you go, the Merge Join component can indeed produce a sorted output and that’s very useful in order to avoid unnecessary expensive Sort operations downstream. Hope this is useful to someone out there! @Jamiet P.S. Thank you to Bob Bojanic on the SSIS product team who pointed this out to me!

Read the article
SSIS Reporting Pack – a performance tip

- by jamiet

SSIS Reporting Pack is a suite of open source SQL Server Reporting Services (SSRS) reports that provide additional insight into the SQL Server Integration Services (SSIS) 2012 Catalog. You can read more about SSIS Reporting Pack here on my blog or had over to the home page for the project at http://ssisreportingpack.codeplex.com/. After having used SSRS Reporting Pack on a real project for a few months now I have come to realise that if you have any sizeable data volumes in [SSISDB] then the reports in SSIS Reporting Pack will suffer from chronic performance problems – I have seen the “execution” report take upwards of 30minutes to return data. To combat this I highly recommend that you create an index on the [SSISDB].[internal].[event_messages].[operation_id] & [SSISDB].[internal].[operation_messages].[operation_id] fields. Phil Brammer has experienced similar problems himself and has since made it easy for the rest of us by preparing some scripts to create the indexes that he recommends and he has shared those scripts via his blog at http://www.ssistalk.com/SSIS_2012_Missing_Indexes.zip. If you are using SSIS Reporting Pack, or even if you are simply querying [SSISDB], I highly recommend that you download Phil’s scripts and test them out on your own SSIS Catalog(s). Those indexes will not solve all problems but they will make some of your reports run quicker. I am working on some further enhancements that should further improve the performance of the reports. Watch this space. @Jamiet

Read the article
Always use dtexec.exe to test performance of your dataflows. No exceptions.

- by jamiet

Earlier this evening I posted a blog post entitled Investigation: Can different combinations of components effect Dataflow performance? where I compared the performance of three different dataflows all working to the same overall goal. I wanted to make one last point related to the results but I thought it warranted a blog post all of its own. Here is a screenshot of one of the dataflows that I was testing: Pretty complicated I’m sure you’ll agree. Now, when I executed this dataflow in the test it was executing in ~19seconds however in that case I was executing using the command-line tool dtexec. I also tried executing inside the BIDS development environment and in that case it took much longer – 139seconds. That’s more than seven times as long. The point I want to make is very simple. If you are testing your dataflows for performance please use dtexec. Nothing else will suffice. @Jamiet

Read the article

Search Results

Search found 40741 results on 1630 pages for 'technology java se'.

Page 269/1630 | < Previous Page | 265 266 267 268 269 270 271 272 273 274 275 276 | Next Page >

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by pinaldave

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by Roger Brinkley

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by jamiet

- by celine.beck

- by simonc

- by Bob Rhubart

- by blakmk

- by jamiet

- by arungupta

- by jamiet

- by jamiet

- by jamiet

< Previous Page | 265 266 267 268 269 270 271 272 273 274 275 276 | Next Page >