Search Results

Search found 24464 results on 979 pages for 'autovue document print se'.

Page 28/979 | < Previous Page | 24 25 26 27 28 29 30 31 32 33 34 35  | Next Page >

  • Parsing flat files using SSIS : SSIS Nugget

    - by jamiet
    Often when using SQL Server Integration Services (SSIS) you will find there is more than one way of accomplishing a task and that the most obvious method of doing so might not be the optimal one. In the video below I demonstrate this by way of an experiment using SSIS’s Flat File Source component; I show different ways that you can pull data from a flat file into the SSIS dataflow and also how the nature of the data itself can influence your choice as to how this task should be accomplished. If you are having trouble viewing the video in your blog reader then head to http://sqlblog.com/blogs/jamie_thomson/archive/2010/03/25/parsing-flat-files-using-ssis-ssis-nugget.aspx to see it as it is hosted on my blog!  The main point I want to get across from this video is that a little bit of creative thinking when building your dataflows can sometimes be very beneficial for performance; quite often building a solution that isn’t the most obvious might actually turn out to be the best one. You’ll notice, if you have watched the video, that my editing skills weren’t quite up to snuff and I cut off the final few words however all I was saying was that if you have any feedback on this video then I would love to hear it either via email or preferably the comments section below. I hope this turns out to be useful to some of you. @Jamiet P.S. Incidentally the parsing that we do using SSIS expressions in the video would be much easier if we had a TOKENISE function in SSIS’s expression language and I have asked for the introduction of such a function on Connect at [SSIS] TOKEN(string, tokeniser_string, occurence) function. Feel free to go and vote that up if you think this feature would be useful! Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Introducing SSIS Reporting Pack for SQL Server code-named Denali

    - by jamiet
    In recent blog posts I have introduced the new SSIS Catalog that is forthcoming in SQL Server Code-named Denali: What's new in SSIS in Denali Introduction to SSIS Projects in Denali Parameters in SSIS In Denali SSIS Server, Catalogs, Environments and Environment Variables in SSIS in Denali The SSIS Catalog is responsible for executing SSIS packages and also for capturing the metadata from those executions. However, at the time of writing there is no mechanism provided to view analyse and drill into that metadata and that is the reason that I am, in this blog post, introducing a suite of SSIS Catalog reports called the SSIS Reporting Pack which you can download from my SkyDrive at http://cid-550f681dad532637.office.live.com/self.aspx/Public/SSIS%20Reporting%20Pack/SSISReportingPack%20v0.1.zip. In this first release the SSIS Reporting Pack includes five reports: Catalog – A high-level summary of all activity in the Catalog Folders – A summary of activity in each Catalog Folder Folder – Project-level activity per single Folder Executions – A visualisation of all executions per Folder/Project/Package/Environment or subset thereof Execution – Information about an individual execution Here is a screenshot of the Executions report: Notice that the SSIS Reporting Pack provides a visual overview of all executions in the Catalog. Each execution is represented as a bar on the bar chart, the success or otherwise of each execution is indicated by the colour of the bar and the execution time is indicated by the bar height. I have recorded a video that gives an overview of the SSIS Reporting which I have embedded below. If you are having any trouble viewing the video go see it at http://vimeo.com/17617974 I must stress that this is a very early version of the SSIS Reporting Pack and I am expecting it to change a lot over the coming year. I am very keen to get some feedback about this, specifically: let me know if anything does not work as you expect give me your feature requests The easiest way to get hold of of me for now is within the comments section of this blog post. That’s all for now. I hope the SSIS Reporting Pack proves useful and I look forward to hearing your feedback. Lastly, that download link again: http://cid-550f681dad532637.office.live.com/self.aspx/Public/SSIS%20Reporting%20Pack/SSISReportingPack%20v0.1.zip. @jamiet

    Read the article

  • FileNameColumnName property, Flat File Source Adapter : SSIS Nugget

    - by jamiet
    I saw a question on MSDN’s SSIS forum the other day that went something like this: I’m loading data into a table from a flat file but I want to be able to store the name of that file as well. Is there a way of doing that? I don’t want to come across as disrespecting those who took the time to reply but there was a few answers along the lines of “loop over the files using a For Each, store the file name in a variable yadda yadda yadda” when in fact there is a much much simpler way of accomplishing this; it just happens to be a little hidden away as I shall now explain! The Flat File Source Adapter has a property called FileNameColumnName which for some reason it isn’t exposed through the Flat File Source editor, it is however exposed via the Advanced Properties: You’ll see in the screenshot above that I have set FileNameColumnName=“Filename” (it doesn’t matter what name you use, anything except a non-zero string will work). What this will do is create a new column in our dataflow called “Filename” that contains, unsurprisingly, the name of the file from which the row was sourced. All very simple. This is particularly useful if you are extracting data from multiple files using the MultiFlatFile Connection Manager as it allows you to differentiate between data from each of the files as you can see in the following screenshot: So there you have it, the FileNameColumnName property; a little known secret of SSIS. I hope it proves to be useful to someone out there. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • SSIS Lookup component tuning tips

    - by jamiet
    Yesterday evening I attended a London meeting of the UK SQL Server User Group at Microsoft’s offices in London Victoria. As usual it was both a fun and informative evening and in particular there seemed to be a few questions arising about tuning the SSIS Lookup component; I rattled off some comments and figured it would be prudent to drop some of them into a dedicated blog post, hence the one you are reading right now. Scene setting A popular pattern in SSIS is to use a Lookup component to determine whether a record in the pipeline already exists in the intended destination table or not and I cover this pattern in my 2006 blog post Checking if a row exists and if it does, has it changed? (note to self: must rewrite that blog post for SSIS2008). Fundamentally the SSIS lookup component (when using FullCache option) sucks some data out of a database and holds it in memory so that it can be compared to data in the pipeline. One of the big benefits of using SSIS dataflows is that they process data one buffer at a time; that means that not all of the data from your source exists in the dataflow at the same time and is why a SSIS dataflow can process data volumes that far exceed the available memory. However, that only applies to data in the pipeline; for reasons that are hopefully obvious ALL of the data in the lookup set must exist in the memory cache for the duration of the dataflow’s execution which means that any memory used by the lookup cache will not be available to be used as a pipeline buffer. Moreover, there’s an obvious correlation between the amount of data in the lookup cache and the time it takes to charge that cache; the more data you have then the longer it will take to charge and the longer you have to wait until the dataflow actually starts to do anything. For these reasons your goal is simple: ensure that the lookup cache contains as little data as possible. General tips Here is a simple tick list you can follow in order to tune your lookups: Use a SQL statement to charge your cache, don’t just pick a table from the dropdown list made available to you. (Read why in SELECT *... or select from a dropdown in an OLE DB Source component?) Only pick the columns that you need, ignore everything else Make the database columns that your cache is populated from as narrow as possible. If a column is defined as VARCHAR(20) then SSIS will allocate 20 bytes for every value in that column – that is a big waste if the actual values are significantly less than 20 characters in length. Do you need DT_WSTR typed columns or will DT_STR suffice? DT_WSTR uses twice the amount of space to hold values that can be stored using a DT_STR so if you can use DT_STR, consider doing so. Same principle goes for the numerical datatypes DT_I2/DT_I4/DT_I8. Only populate the cache with data that you KNOW you will need. In other words, think about your WHERE clause! Thinking outside the box It is tempting to build a large monolithic dataflow that does many things, one of which is a Lookup. Often though you can make better use of your available resources by, well, mixing things up a little and here are a few ideas to get your creative juices flowing: There is no rule that says everything has to happen in a single dataflow. If you have some particularly resource intensive lookups then consider putting that lookup into a dataflow all of its own and using raw files to pass the pipeline data in and out of that dataflow. Know your data. If you think, for example, that the majority of your incoming rows will match with only a small subset of your lookup data then consider chaining multiple lookup components together; the first would use a FullCache containing that data subset and the remaining data that doesn’t find a match could be passed to a second lookup that perhaps uses a NoCache lookup thus negating the need to pull all of that least-used lookup data into memory. Do you need to process all of your incoming data all at once? If you can process different partitions of your data separately then you can partition your lookup cache as well. For example, if you are using a lookup to convert a location into a [LocationId] then why not process your data one region at a time? This will mean your lookup cache only has to contain data for the location that you are currently processing and with the ability of the Lookup in SSIS2008 and beyond to charge the cache using a dynamically built SQL statement you’ll be able to achieve it using the same dataflow and simply loop over it using a ForEach loop. Taking the previous data partitioning idea further … a dataflow can contain more than one data path so why not split your data using a conditional split component and, again, charge your lookup caches with only the data that they need for that partition. Lookups have two uses: to (1) find a matching row from the lookup set and (2) put attributes from that matching row into the pipeline. Ask yourself, do you need to do these two things at the same time? After all once you have the key column(s) from your lookup set then you can use that key to get the rest of attributes further downstream, perhaps even in another dataflow. Are you using the same lookup data set multiple times? If so, consider the file caching option in SSIS 2008 and beyond. Above all, experiment and be creative with different combinations. You may be surprised at what works. Final  thoughts If you want to know more about how the Lookup component differs in SSIS2008 from SSIS2005 then I have a dedicated blog post about that at Lookup component gets a makeover. I am on a mini-crusade at the moment to get a BULK MERGE feature into the database engine, the thinking being that if the database engine can quickly merge massive amounts of data in a similar manner to how it can insert massive amounts using BULK INSERT then that’s a lot of work that wouldn’t have to be done in the SSIS pipeline. If you think that is a good idea then go and vote for BULK MERGE on Connect. If you have any other tips to share then please stick them in the comments. Hope this helps! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Dynamic Unpivot : SSIS Nugget

    - by jamiet
    A question on the SSIS forum earlier today asked: I need to dynamically unpivot some set of columns in my source file. Every month there is one new column and its set of Values. I want to unpivot it without editing my SSIS packages that is deployed Let’s be clear about what we mean by Unpivot. It is a normalisation technique that basically converts columns into rows. By way of example it converts something like this: AccountCode Jan Feb Mar AC1 100.00 150.00 125.00 AC2 45.00 75.50 90.00 into something like this: AccountCode Month Amount AC1 Jan 100.00 AC1 Feb 150.00 AC1 Mar 125.00 AC2 Jan 45.00 AC2 Feb 75.50 AC2 Mar 90.00 The Unpivot transformation in SSIS is perfectly capable of carrying out the operation defined in this example however in the case outlined in the aforementioned forum thread the problem was a little bit different. I interpreted it to mean that the number of columns could change and in that scenario the Unpivot transformation (and indeed the SSIS dataflow in general) is rendered useless because it expects that the number of columns will not change from what is specified at design-time. There is a workaround however. Assuming all of the columns that CAN exist will appear at the end of the rows, we can (1) import all of the columns in the file as just a single column, (2) use a script component to loop over all the values in that “column” and (3) output each one as a column all of its own. Let’s go over that in a bit more detail.   I’ve prepared a data file that shows some data that we want to unpivot which shows some customers and their mythical shopping lists (it has column names in the first row): We use a Flat File Connection Manager to specify the format of our data file to SSIS: and a Flat File Source Adapter to put it into the dataflow (no need a for a screenshot of that one – its very basic). Notice that the values that we want to unpivot all exist in a column called [Groceries]. Now onto the script component where the real work goes on, although the code is pretty simple: Here I show a screenshot of this executing along with some data viewers. As you can see we have successfully pulled out all of the values into a row all of their own thus accomplishing the Dynamic Unpivot that the forum poster was after. If you want to run the demo for yourself then I have uploaded the demo package and source file up to my SkyDrive: http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100529/Dynamic%20Unpivot.zip Simply extract the two files into a folder, make sure the Connection Manager is pointing to the file, and execute! Hope this is useful. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • FileNameColumnName property, Flat File Source Adapter : SSIS Nugget

    - by jamiet
    I saw a question on MSDN’s SSIS forum the other day that went something like this: I’m loading data into a table from a flat file but I want to be able to store the name of that file as well. Is there a way of doing that? I don’t want to come across as disrespecting those who took the time to reply but there was a few answers along the lines of “loop over the files using a For Each, store the file name in a variable yadda yadda yadda” when in fact there is a much much simpler way of accomplishing this; it just happens to be a little hidden away as I shall now explain! The Flat File Source Adapter has a property called FileNameColumnName which for some reason it isn’t exposed through the Flat File Source editor, it is however exposed via the Advanced Properties: You’ll see in the screenshot above that I have set FileNameColumnName=“Filename” (it doesn’t matter what name you use, anything except a non-zero string will work). What this will do is create a new column in our dataflow called “Filename” that contains, unsurprisingly, the name of the file from which the row was sourced. All very simple. This is particularly useful if you are extracting data from multiple files using the MultiFlatFile Connection Manager as it allows you to differentiate between data from each of the files as you can see in the following screenshot: So there you have it, the FileNameColumnName property; a little known secret of SSIS. I hope it proves to be useful to someone out there. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Inequality joins, Asynchronous transformations and Lookups : SSIS

    - by jamiet
    It is pretty much accepted by SQL Server Integration Services (SSIS) developers that synchronous transformations are generally quicker than asynchronous transformations (for a description of synchronous and asynchronous transformations go read Asynchronous and synchronous data flow components). Notice I said “generally” and not “always”; there are circumstances where using asynchronous transformations can be beneficial and in this blog post I’ll demonstrate such a scenario, one that is pretty common when building data warehouses. Imagine I have a [Customer] dimension table that manages information about all of my customers as a slowly-changing dimension. If that is a type 2 slowly changing dimension then you will likely have multiple rows per customer in that table. Furthermore you might also have datetime fields that indicate the effective time period of each member record. Here is such a table that contains data for four dimension members {Terry, Max, Henry, Horace}: Notice that we have multiple records per customer and that the [SCDStartDate] of a record is equivalent to the [SCDEndDate] of the record that preceded it (if there was one). (Note that I am on record as saying I am not a fan of this technique of storing an [SCDEndDate] but for the purposes of clarity I have included it here.) Anyway, the idea here is that we will have some incoming data containing [CustomerName] & [EffectiveDate] and we need to use those values to lookup [Customer].[CustomerId]. The logic will be: Lookup a [CustomerId] WHERE [CustomerName]=[CustomerName] AND [SCDStartDate] <= [EffectiveDate] AND [EffectiveDate] <= [SCDEndDate] The conventional approach to this would be to use a full cached lookup but that isn’t an option here because we are using inequality conditions. The obvious next step then is to use a non-cached lookup which enables us to change the SQL statement to use inequality operators: Let’s take a look at the dataflow: Notice these are all synchronous components. This approach works just fine however it does have the limitation that it has to issue a SQL statement against your lookup set for every row thus we can expect the execution time of our dataflow to increase linearly in line with the number of rows in our dataflow; that’s not good. OK, that’s the obvious method. Let’s now look at a different way of achieving this using an asynchronous Merge Join transform coupled with a Conditional Split. I’ve shown it post-execution so that I can include the row counts which help to illustrate what is going on here: Notice that there are more rows output from our Merge Join component than on the input. That is because we are joining on [CustomerName] and, as we know, we have multiple records per [CustomerName] in our lookup set. Notice also that there are two asynchronous components in here (the Sort and the Merge Join). I have embedded a video below that compares the execution times for each of these two methods. The video is just over 8minutes long. View on Vimeo  For those that can’t be bothered watching the video I’ll tell you the results here. The dataflow that used the Lookup transform took 36 seconds whereas the dataflow that used the Merge Join took less than two seconds. An illustration in case it is needed: Pretty conclusive proof that in some scenarios it may be quicker to use an asynchronous component than a synchronous one. Your mileage may of course vary. The scenario outlined here is analogous to performance tuning procedural SQL that uses cursors. It is common to eliminate cursors by converting them to set-based operations and that is effectively what we have done here. Our non-cached lookup is performing a discrete operation for every single row of data, exactly like a cursor does. By eliminating this cursor-in-disguise we have dramatically sped up our dataflow. I hope all of that proves useful. You can download the package that I demonstrated in the video from my SkyDrive at http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100514/20100514%20Lookups%20and%20Merge%20Joins.zip Comments are welcome as always. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Enforce SSIS naming conventions using BI-xPress

    - by jamiet
    A long long long time ago (in 2006 in fact) I published a blog post entitled Suggested Best Practises and naming conventions in which I suggested a bunch of acronyms that folks could use to prefix object names in their SSIS packages, thus allowing easier identification of those objects in log records, here is a sample of some of those suggestions: If you have adopted these naming conventions (and I am led to believe that a bunch of people have) then you might like to know that you can now check for adherence to these conventions using a tool called BI-xPress from Pragmatic Works. BI-xPress includes a feature called the Best Practices Analyzer that scans your packages and assess them according to some rules that you specify. In addition Pragmatic Works have made available a collection of these rules that adhere to the naming conventions I specified in 2006 You can download this collection however I recommend you first read the accompanying article that demonstrates the capabilities of the Best Practices Analyzer. Pretty cool stuff. @Jamiet

    Read the article

  • Splwow64 with TS Easy Print

    - by Tim Brigham
    I have an application (Sage MIP Fund Accounting) which exports data to Excel. In this process it uses an internal print driver. Since we upgraded from 2008 to 2008 R2 this export process causes system hangs. This has been isolated down to the splwow64 executable hanging while the Excel document is building. If I kill the spwow64 executable things function properly (I just can't print it once completed). This only occurs while using printer redirection using the Remote Desktop Easy Print function - if I pull the printer redirection things work exactly as expected. I've spent the last couple hours looking at hotfixes or driver upgrades since this appears to be a problem specifically with how the Remote Desktop Easy Printer printer is functioning. Is anyone aware of a hotfix which would be applicable in this situation? I don't want to grab every hotfix for redirected printing and start throwing them out there.

    Read the article

  • Querying the SSIS Catalog? Here’s a handy query!

    - by jamiet
    I’ve been working on a SQL Server Integration Services (SSIS) solution for about 6 months now and I’ve learnt many many things that I intend to share on this blog just as soon as I get the time. Here’s a very short starter-for-ten… I’ve found the following query to be utterly invaluable when interrogating the SSIS Catalog to discover what is going on in my executions: SELECT event_message_id,MESSAGE,package_name,event_name,message_source_name,package_path,execution_path,message_type,message_source_typeFROM   (       SELECT  em.*       FROM    SSISDB.catalog.event_messages em       WHERE   em.operation_id = (SELECT MAX(execution_id) FROM SSISDB.catalog.executions)           AND event_name NOT LIKE '%Validate%'       )q/* Put in whatever WHERE predicates you might like*/--WHERE event_name = 'OnError'--WHERE package_name = 'Package.dtsx'--WHERE execution_path LIKE '%<some executable>%'ORDER BY message_time DESC Know it. Learn it. Love it. @jamiet

    Read the article

  • SQL SERVER – Free Print Book on SQL Server Joes 2 Pros Kit

    - by pinaldave
    Rick Morelan and I were discussing earlier this month that what we can give back to the community. We believe our books are very much successful and very well received by the community. The five books are a journey from novice to expert. The books have changed many lives and helped many get jobs as well pass the SQL Certifications. Rick is from Seattle, USA and I am from Bangalore, India. There are 12 hours difference between us. We try to do weekly meeting to catch up on various personal and SQL related topics. Here is one of our recent conversations. Rick and Pinal Pinal: Good Morning Rick! Rick: Good Morning…err… Good Evening to you – Pinal! Pinal: Hey Rick, did you read the recent email which I sent you – one of our reader is thanking us for writing Joes 2 Pros series. He wants to dedicate his success to us. Can you believe it? Rick: Yeah, he is very kind but did you tell him that it is all because of his hard work on learning subject and we have very little contribution in his success. Pinal: Absolutely, I told him the same – I said we just wrote the book but it is he who learned from it and proved himself in his job. It is all him! We were just igniters. Rick: Good response. Pinal: Hey Rick! Are we doing enough for the community? What can we do more? Rick: Hmmm… Let us do something more. Pinal: Remember once we discussed the idea of if anyone who buys our Joes 2 Pros Combo Kit in the next 2 weeks – we will send them SQL Wait Stats for free. What do you say? Rick: I agree! Great Idea! Let us do it. Free Giveaway Well Rick and I liked the idea of doing more. We have decided to give away free SQL Server Wait Stats books to everybody who will purchase Joes 2 Pros Combo Kit between today (Oct 15, 2012) and Oct 26, 2012. This is not a contest or a lucky winner opportunity. Everybody who participates will qualify for it. Combo Availability USA – Amazon India - Flipkart | Indiaplaza Note1: USA kit contains FREE 5 DVDs. India Kit does not contain 5 DVDs due to legal issues. Note2: Indian Kit is priced at special Indian Economic Price. Qualify for Free Giveaway You must have purchased our Joes 2 Pros Combo Kit of 5 books between Oct 15, 2012 and Oct 26, 2012. Purchase before Oct 15, 2012 and after Oct 26, 2012 will not qualify for this giveaway. Send your original receipt (email, order details) to following addresses: “[email protected];[email protected]” with the subject line “Joes 2 Pros Kit Promotion Free Offer”. Do not change the subject line or your email may be missed.  Clearly mention your shipping address with phone number and pin/zip code. Send your receipt before Oct 30, 2012. We will not entertain any conversation after Oct 30, 2012 cut off date. The Free books will be sent to USA and India address only. Availability USA - Amazon | India - Flipkart | Indiaplaza Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Joes 2 Pros, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Book Review, SQLServer, T SQL, Technology

    Read the article

  • Java Spotlight Episode 77: Donald Smith on the OpenJDK and Java

    - by Roger Brinkley
    Tweet An interview with Donald Smith about Java and OpenJDK. Joining us this week on the Java All Star Developer Panel are Dalibor Topic, Java Free and Open Source Software Ambassador and Arun Gupta, Java EE Guy. Right-click or Control-click to download this MP3 file. You can also subscribe to the Java Spotlight Podcast Feed to get the latest podcast automatically. If you use iTunes you can open iTunes and subscribe with this link:  Java Spotlight Podcast in iTunes. Show Notes News Jersey 2.0 Milestone 2 available Oracle distribution of Eclipse (OEPE) now supports GlassFish 3.1.2 Oracle Linux 6 is now part of the certification matrix for 3.1.2 3rd part of Spring -> Java EE 6 article series published Joe Darcy - Repeating annotations in the works JEP 152: Crypto Operations with Network HSMs JEP 153: Launch JavaFX Applications OpenJDK bug database: Status update OpenJDK Governing Board 2012 Election: Results jtreg update March 2012 Take Two: Comparing JVMs on ARM/Linux The OpenJDK group at Oracle is growing App bundler project now open Events April 4-5, JavaOne Japan, Tokyo, Japan April 11, Cleveland JUG, Cleveland, OH April 12, GreenJUG, Greenville, SC April 17-18, JavaOne Russia, Moscow Russia April 18–20, Devoxx France, Paris, France April 17-20, GIDS, Bangalore April 21, Java Summit, Chennai April 26, Mix-IT, Lyon, France, May 3-4, JavaOne India, Hyderabad, India May 5, Bangalore, Pune, ?? - JUG outreach May 7, OTN Developer Day, Mumbai May 8, OTN Developer Day, Delhi Feature InterviewDonald Smith, MBA, MSc, is Director of Product Management for Oracle. He brings worldwide enterprise software experience, ranging from small "dot-com" through Fortune 500 companies. Donald speaks regularly about Java, open source, community development, business models, business integration and software development politics at conferences and events worldwide including Java One, Oracle World, Sun Tech Days, Evans Developer Relations Conference, OOPSLA, JAOO, Server Side Symposium, Colorado Software Summit and others. Prior to returning to Oracle, Donald was Director of Ecosystem Development for the Eclipse Foundation, an independent not-for-profit foundation supporting the Eclipse open source community. Mail Bag What’s Cool OpenJDK 7 port to Haiku JEP 154: Remove Serialization Goto for the Java Programming Language

    Read the article

  • SSIS Prehistory video

    - by jamiet
    I’m currently wasting spending my Easter bank holiday putting together my presentation SSIS Dataflow Performance Tuning for the upcoming SQL Bits conference in London and in doing so I’m researching some old material about how the dataflow actually works. Boring as it is I’ve gotten easily sidelined and have chanced upon an old video on Channel 9 entitled Euan Garden - Tour of SQL Server Team (part I). Euan is a former member of the SQL Server team and in this series of videos he walks the halls of the SQL Server building on Microsoft’s Redmond campus talking to some of the various protagonists and in this one he happens upon the SQL Server Integration Services team. The video was shot in 2004 so this is a fascinating (to me anyway) glimpse into the development of SSIS from before it was ever shipped and if you’re a geek like me you’ll really enjoy this behind-the-scenes look into how and why the product was architected. The video is also notable for the presence of the cameraman – none other than the now-rather-more-famous-than-he-was-then Robert Scoble. See it at http://channel9.msdn.com/posts/TheChannel9Team/Euan-Garden-Tour-of-SQL-Server-Team-part-I/ Enjoy! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • how do you document your development process?

    - by David
    My current state is a mixture of spreadsheets, wikis, documents, and dated folders for my input/configuration and output files and bzr version control for code. I am relatively new to programming that requires this level of documentation, and I would like to find a better, more coherent approach. update (for clarity): My inputs are data used to generate configuration files with parameter values and my outputs are analyses of model predictions. I would really like to have an approach that allows me to associate particular configuration(s) with particular outputs, so that I can ask questions of my documentation such as "what causes over/under estimates?" or "what causes error 'X'"?

    Read the article

  • SSIS Catalog, Windows updates and deployment failures due to System.Core mismatch

    - by jamiet
    This is a heads-up for anyone doing development on SSIS. On my current project where we are implementing a SQL Server Integration Services (SSIS) 2012 solution we recently encountered a situation where we were unable to deploy any of our projects even though we had successfully deployed in the past. Any attempt to use the deployment wizard resulted in this error dialog: The text of the error (for all you search engine crawlers out there) was: A .NET Framework error occurred during execution of user-defined routine or aggregate "create_key_information": System.IO.FileLoadException: Could not load file or assembly 'System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040) ---> System.IO.FileLoadException: The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040) System.IO.FileLoadException: System.IO.FileLoadException:     at Microsoft.SqlServer.IntegrationServices.Server.Security.CryptoGraphy.CreateSymmetricKey(String algorithm)    at Microsoft.SqlServer.IntegrationServices.Server.Security.CryptoGraphy.CreateKeyInformation(SqlString algorithmName, SqlBytes& key, SqlBytes& IV) . (Microsoft SQL Server, Error: 6522) After some investigation and a bit of back and forth with some very helpful members of the SSIS product team (hey Matt, Wee Hyong) it transpired that this was due to a .Net Framework fix that had been delivered via Windows Update. I took a look at the server update history and indeed there have been some recently applied .Net Framework updates: This fix had (in the words of Matt Masson) “somehow caused a mismatch on System.Core for SQLCLR” and, as you may know, SQLCLR is used heavily within the SSIS Catalog. The fix was pretty simple – restart SQL Server. This causes the assemblies to be upgraded automatically. If you are using Data Quality Services (DQS) you may have experienced similar problems which are documented at Upgrade SQLCLR Assemblies After .NET Framework Update. I am hoping the SSIS team will follow-up with a more thorough explanation on their blog soon. You DBAs out there may be questioning why Windows Update is set to automatically apply updates on our production servers. We’re checking that out with our hosting provider right now You have been warned! @Jamiet

    Read the article

  • Java Spotlight Episode 86: Tony Printezis on Garbage Collection First

    - by Roger Brinkley
    Interview with Tony Printezis on Garbage Collection First (GC1). Joining us this week on the Java All Star Developer Panel is Arun Gupta, Java EE Guy. Right-click or Control-click to download this MP3 file. You can also subscribe to the Java Spotlight Podcast Feed to get the latest podcast automatically. If you use iTunes you can open iTunes and subscribe with this link:  Java Spotlight Podcast in iTunes. Show Notes News JSR 358: A major revison of the Java Community Process - JCP 3.Next JAX-RS 2.0 Early Draft- Third Edition Events June 11-14, Cloud Computing Expo, New York City June 12, Boulder JUG June 13, Denver JUG June 13, Eclipse Juno DemoCamp, Redwoood Shore June 13, JUG Münster June 14, Java Klassentreffen, Vienna, Austria June 18-20, QCon, New York City June 19, CJUG, Chicago June 20, 1871, Chicago June 26-28, Jazoon, Zurich, Switzerland Jun 27, Houston JUG ?? July 5, Java Forum, Stuttgart, Germany Jul 13-14, IndicThreads, Delhi July 30-August 1, JVM Language Summit, Santa Clara Feature InterviewTony Printezis is a Principal Member of Technical Staff at Oracle, based in Burlington, MA. He has been contributing to the Java HotSpot Virtual Machine since 2006. He spends most of his time working on dynamic memory management for the Java platform, concentrating on performance, scalability, responsiveness, parallelism, and visualization of garbage collectors. He obtained a Ph.D. in 2000 and a BSc (Hons) in 1995, both from the University of Glasgow in Scotland. In addition, he is a JavaOne Rock Star, a title awarded for his highly rated JavaOne session on GC. Mail Bag What’s Cool JavaOne content selection is complete. Notifications done.

    Read the article

  • Design and Print Your Own Christmas Cards in MS Word, Part 1

    - by Eric Z Goodnight
    Looking for a  little DIY fun this holiday season? Open up familiar tool MS Word and create simple, beautiful Christmas and Holiday cards, and impress your family with your crafting skills. This is the first part of a two part article. In this first section, we’ll tackle design in MS Word. In our second, we’ll cover supplies and proper printing methods to get a great look out of your dusty old inkjet. Latest Features How-To Geek ETC The How-To Geek Guide to Learning Photoshop, Part 8: Filters Get the Complete Android Guide eBook for Only 99 Cents [Update: Expired] Improve Digital Photography by Calibrating Your Monitor The How-To Geek Guide to Learning Photoshop, Part 7: Design and Typography How to Choose What to Back Up on Your Linux Home Server How To Harmonize Your Dual-Boot Setup for Windows and Ubuntu Hang in There Scrat! – Ice Age Wallpaper How Do You Know When You’ve Passed Geek and Headed to Nerd? On The Tip – A Lamborghini Theme for Chrome and Iron What if Wile E. Coyote and the Road Runner were Human? [Video] Peaceful Winter Cabin Wallpaper Store Tabs for Later Viewing in Opera with Tab Vault

    Read the article

  • Investigation: Can different combinations of components effect Dataflow performance?

    - by jamiet
    Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation!  The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test:     One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category         CSPL Split by Month of Birth and Income Category       Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category         CSPL Split by Month of Birth 1& 2       Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

    Read the article

  • SSIS Prehistory video

    - by jamiet
    I’m currently wasting spending my Easter bank holiday putting together my presentation SSIS Dataflow Performance Tuning for the upcoming SQL Bits conference in London and in doing so I’m researching some old material about how the dataflow actually works. Boring as it is I’ve gotten easily sidelined and have chanced upon an old video on Channel 9 entitled Euan Garden - Tour of SQL Server Team (part I). Euan is a former member of the SQL Server team and in this series of videos he walks the halls of the SQL Server building on Microsoft’s Redmond campus talking to some of the various protagonists and in this one he happens upon the SQL Server Integration Services team. The video was shot in 2004 so this is a fascinating (to me anyway) glimpse into the development of SSIS from before it was ever shipped and if you’re a geek like me you’ll really enjoy this behind-the-scenes look into how and why the product was architected. The video is also notable for the presence of the cameraman – none other than the now-rather-more-famous-than-he-was-then Robert Scoble. See it at http://channel9.msdn.com/posts/TheChannel9Team/Euan-Garden-Tour-of-SQL-Server-Team-part-I/ Enjoy! @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Code formatter for SSMS

    - by blakmk
      I was searching recently for a code formatter for T-Sql and I came accross this nice little utility that I wanted to share: http://www.wangz.net/cgi-bin/pp/gsqlparser/sqlpp/sqlformat.tpl I've been dealing with a lot of legacy code latley and there is nothing I find more infuriating than unformatted code. This tool seems to work quite well. Just one click and it formats everything nicely. There is also a free web version.                                           This Web Page Created with PageBreeze Free HTML Editor

    Read the article

  • Bash prompt doesn't print until I interact with console again

    - by durron597
    I don't even know where to begin to diagnose this one. Usually, when a command finishes, the prompt prints itself for the next command. However, that is not happening. Hard to explain with words, I'll just use an example: User@Machine:~$ cp /mnt/mountname/directory/textfile.txt . After waiting several seconds (far too long for this operation on a small file) I press Enter, and see: User@Machine:~$ cp /mnt/mountname/directory/textfile.txt . User@Machine:~$ User@Machine:~$ So clearly the operation had finished, but the prompt didn't display... until I pressed enter, and then BOTH prompts instantly displayed. This error does not happen with commands like cd.

    Read the article

  • Have SSIS' differing type systems ever caused you problems?

    - by jamiet
    One thing that has always infuriated me about SSIS is the fact that every package has three different type systems; to give you an idea of what I am talking about consider the following: The SSIS dataflow's type system is made up of types called DT_*  (e.g. DT_STR, DT_I4) The SSIS variable type system is based on .Net datatypes (e.g. String, Int32) The types available for Execute SQL Task's parameters are based on something else - I don't exactly know what (e.g. VARCHAR, LONG) Speaking euphemistically ... this is not an optimum situation (were I not speaking euphemistically I would be a lot ruder) and hence I have submitted a suggestion to Connect at [SSIS] Consolidate three type systems into one requesting that it be remedied. This accompanying blog post is not however a request for votes (though that would be nice); the reason is actually subtler than that. Let me explain. I have been submitting bugs and suggestions pertaining to SSIS for years and have, so far, submitted over 200 Connect items. If that experience has taught me anything it is this - Connect items are not generally actioned because they are considered "nice to have". No, SSIS Connect items get actioned because they cause customers grief and if I am perfectly honest I must admit that, other than being a bit gnarly, SSIS' three type system architecture has never knowingly caused me any significant problems. The reason for this blog post is to ask if any reader out there has ever encountered any problems on account of SSIS' three type systems or have you, like me, never found them to be a problem? Errors or performance degredation caused by implicit type conversions would, I believe, present a strong case for getting this situation remedied in a future version of SSIS so if you HAVE encountered such problems I would encourage you to leave a comment on the Connect submission accordingly. Let me know in the comments too - I would be interested to hear others' opinions on this. @Jamiet

    Read the article

  • New SSIS features and enhancements in Denali – a webinar on 28th June in association with Pragmatic Works

    - by jamiet
    Tomorrow I shall be presenting a webinar entitled “New SSIS features and enhancements in Denali”. The webinar is being hosted by Pragmatic Works and you can sign up for it at Pragmatic Works webinars. The webinar will start at 1930BST and you can view the time for your timezone at this link: http://www.timeanddate.com/worldclock/fixedtime.html?msg=New+SSIS+features+and+enhancements+in+Denali&iso=20110628T1830 The webinar was arranged a few months ago and at that time we were hoping that the next Community Technology Preview (CTP) of SQL Server Denali would be available for public consumption; unfortunately it transpires that that is not yet the case and hence I will be presenting new features of CTP1 that was released at the start of this year. If you’re not yet familiar with the new features of SSIS that are coming in the next release of SQL Server then please do come and join the webinar. @Jamiet

    Read the article

  • SSIS Reporting Pack – a performance tip

    - by jamiet
    SSIS Reporting Pack is a suite of open source SQL Server Reporting Services (SSRS) reports that provide additional insight into the SQL Server Integration Services (SSIS) 2012 Catalog. You can read more about SSIS Reporting Pack here on my blog or had over to the home page for the project at http://ssisreportingpack.codeplex.com/. After having used SSRS Reporting Pack on a real project for a few months now I have come to realise that if you have any sizeable data volumes in [SSISDB] then the reports in SSIS Reporting Pack will suffer from chronic performance problems – I have seen the “execution” report take upwards of 30minutes to return data. To combat this I highly recommend that you create an index on the [SSISDB].[internal].[event_messages].[operation_id] & [SSISDB].[internal].[operation_messages].[operation_id] fields. Phil Brammer has experienced similar problems himself and has since made it easy for the rest of us by preparing some scripts to create the indexes that he recommends and he has shared those scripts via his blog at http://www.ssistalk.com/SSIS_2012_Missing_Indexes.zip. If you are using SSIS Reporting Pack, or even if you are simply querying [SSISDB], I highly recommend that you download Phil’s scripts and test them out on your own SSIS Catalog(s). Those indexes will not solve all problems but they will make some of your reports run quicker. I am working on some further enhancements that should further improve the performance of the reports. Watch this space. @Jamiet

    Read the article

  • Always use dtexec.exe to test performance of your dataflows. No exceptions.

    - by jamiet
    Earlier this evening I posted a blog post entitled Investigation: Can different combinations of components effect Dataflow performance? where I compared the performance of three different dataflows all working to the same overall goal. I wanted to make one last point related to the results but I thought it warranted a blog post all of its own. Here is a screenshot of one of the dataflows that I was testing: Pretty complicated I’m sure you’ll agree. Now, when I executed this dataflow in the test it was executing in ~19seconds however in that case I was executing using the command-line tool dtexec. I also tried executing inside the BIDS development environment and in that case it took much longer – 139seconds. That’s more than seven times as long. The point I want to make is very simple. If you are testing your dataflows for performance please use dtexec. Nothing else will suffice. @Jamiet

    Read the article

< Previous Page | 24 25 26 27 28 29 30 31 32 33 34 35  | Next Page >