Search Results

Search found 58774 results on 2351 pages for 'ssis data tranformations'.

Page 3/2351 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • SSIS - Access Denied with UNC paths - The file name is a device or contains invalid characters

    - by simonsabin
    I spent another day tearing my hair out yesterday trying to resolve an issue with SSIS packages runnning in SQLAgent (not got much left at the moment, maybe I should contact the SSIS team for a wig). My situation was that I am deploying packages to a development server, and to provide isolation I was running jobs with a proxy account that only had access to the development servers. Proxies are an awesome feature and mean that you should never have to "just run the job as sysadmin". The issue I was facing was that the job step was failing. The job step was a simple execution of the package.The following errors appeared in my log file. I always check the "Log step output in history" for a job step, this ensures you get all the output from the command that you run. I'll blog about this later. If looking at the output in sysdtslog90 then you will have an entry with datacode -1073573533 and error message File or directory "<filename>" represented by connection "<connection>" does not exist.  Not exactly helpful. If you get the output from the console then you will also get these errors. 0xC0202070 "The file name property is not valid. The file name is a device or contains invalid characters." 0xC001401E "specified in the connection was not valid." It appears this error is due to the use of a UNC path and the account runnnig the package not having access to all the folders in the path. Solution To solve this you need to ensure that the proxy account has access to ALL folders in the path you are accessing. To check this works, logon as the relevant proxy user, or run a command window as the specified user. Then try and do net use \\server\share and then do a dir for each folder in the path and check you have access. If these work and you still have the problem then you have some other problem, sorry. The following are posts on experts exchange that also discuss this,http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/Q_24056047.htmlhttp://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SSIS/Q_23968903.html This blog had a post about it being a 64 bit issue. That definitely wasn't the issue for me as I was on a 32 bit server http://blogs.perkinsconsulting.com/post/64-bit-SQL-Server-2005-SSIS-and-UNC-paths-Part-2.aspx  

    Read the article

  • Dynamic Unpivot : SSIS Nugget

    - by jamiet
    A question on the SSIS forum earlier today asked: I need to dynamically unpivot some set of columns in my source file. Every month there is one new column and its set of Values. I want to unpivot it without editing my SSIS packages that is deployed Let’s be clear about what we mean by Unpivot. It is a normalisation technique that basically converts columns into rows. By way of example it converts something like this: AccountCode Jan Feb Mar AC1 100.00 150.00 125.00 AC2 45.00 75.50 90.00 into something like this: AccountCode Month Amount AC1 Jan 100.00 AC1 Feb 150.00 AC1 Mar 125.00 AC2 Jan 45.00 AC2 Feb 75.50 AC2 Mar 90.00 The Unpivot transformation in SSIS is perfectly capable of carrying out the operation defined in this example however in the case outlined in the aforementioned forum thread the problem was a little bit different. I interpreted it to mean that the number of columns could change and in that scenario the Unpivot transformation (and indeed the SSIS dataflow in general) is rendered useless because it expects that the number of columns will not change from what is specified at design-time. There is a workaround however. Assuming all of the columns that CAN exist will appear at the end of the rows, we can (1) import all of the columns in the file as just a single column, (2) use a script component to loop over all the values in that “column” and (3) output each one as a column all of its own. Let’s go over that in a bit more detail.   I’ve prepared a data file that shows some data that we want to unpivot which shows some customers and their mythical shopping lists (it has column names in the first row): We use a Flat File Connection Manager to specify the format of our data file to SSIS: and a Flat File Source Adapter to put it into the dataflow (no need a for a screenshot of that one – its very basic). Notice that the values that we want to unpivot all exist in a column called [Groceries]. Now onto the script component where the real work goes on, although the code is pretty simple: Here I show a screenshot of this executing along with some data viewers. As you can see we have successfully pulled out all of the values into a row all of their own thus accomplishing the Dynamic Unpivot that the forum poster was after. If you want to run the demo for yourself then I have uploaded the demo package and source file up to my SkyDrive: http://cid-550f681dad532637.skydrive.live.com/self.aspx/Public/BlogShare/20100529/Dynamic%20Unpivot.zip Simply extract the two files into a folder, make sure the Connection Manager is pointing to the file, and execute! Hope this is useful. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Five things SSIS should drop

    - by jamiet
    There’s a current SQL Server meme going round entitled Five things SQL Server should drop and, whilst no-one tagged me to write anything, I couldn’t resist doing the same for SQL Server Integration Services. So, without further ado, here are five things that I think should be dropped from SSIS.Data source connectionsSeriously, does anyone use these? I know why they’re there. Someone sat in a meeting back in the early part of the last decade and said “Ooo, Reporting Services and Analysis Services have these things called Data Sources. If we used them in Integration Services then we’d have a really cool integration story.” Errr….no.Web Service TaskDitto. If you want to do anything useful against anything but the simplest of SOAP web services steer well clear of this peculiar SSIS additionActiveX Script TaskAnother task that I suspect has never seen the light of day in a SSIS package. It was billed as a way of running upgraded DTS2000 ActiveX scripts in SSIS – sounds good except for one thing. Anytime one of those scripts would try to talk to the DTS object model (which they all do – otherwise what’s the point) then they will error out. This one has always been a real head scratcher.Slow Changing Dimension wizardI suspect I may get some push back on this one but I’m mentioning it anyway. Some people like the SCD wizard; I am not one of those people! Everything that the SCD component does can easily be reproduced using other components and from a performance point of view its much more beneficial to use those alternatives.Multifile Connection ManagerImagining buying a house that came with a set of keys that didn’t open any of the doors. Sounds ridiculous right? How about a SSIS Connection Manager that doesn’t get used by any of the tasks or components. Ah, that’ll be the Multifile Connection Manager then!Comments are of course welcome. Diatribes are assumed :)@Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Export all SSIS packages from msdb using Powershell

    - by jamiet
    Have you ever wanted to dump all the SSIS packages stored in msdb out to files? Of course you have, who wouldn’t? Right? Well, at least one person does because this was the subject of a thread (save all ssis packages to file) on the SSIS forum earlier today. Some of you may have already figured out a way of doing this but for those that haven’t here is a nifty little script that will do it for you and it uses our favourite jack-of-all tools … Powershell!! Imagine I have the following package folder structure on my Integration Services server (i.e. in [msdb]): There are two packages in there called “20110111 Chaining Expression components” & “Package”, I want to export those two packages into a folder structure that mirrors that in [msdb]. Here is the Powershell script that will do that:Param($SQLInstance = "localhost") #####Add all the SQL goodies (including Invoke-Sqlcmd)##### add-pssnapin sqlserverprovidersnapin100 -ErrorAction SilentlyContinue add-pssnapin sqlservercmdletsnapin100 -ErrorAction SilentlyContinue cls $Packages = Invoke-Sqlcmd -MaxCharLength 10000000 -ServerInstance $SQLInstance -Query "WITH cte AS ( SELECT cast(foldername as varchar(max)) as folderpath, folderid FROM msdb..sysssispackagefolders WHERE parentfolderid = '00000000-0000-0000-0000-000000000000' UNION ALL SELECT cast(c.folderpath + '\' + f.foldername as varchar(max)), f.folderid FROM msdb..sysssispackagefolders f INNER JOIN cte c ON c.folderid = f.parentfolderid ) SELECT c.folderpath,p.name,CAST(CAST(packagedata AS VARBINARY(MAX)) AS VARCHAR(MAX)) as pkg FROM cte c INNER JOIN msdb..sysssispackages p ON c.folderid = p.folderid WHERE c.folderpath NOT LIKE 'Data Collector%'" Foreach ($pkg in $Packages) { $pkgName = $Pkg.name $folderPath = $Pkg.folderpath $fullfolderPath = "c:\temp\$folderPath\" if(!(test-path -path $fullfolderPath)) { mkdir $fullfolderPath | Out-Null } $pkg.pkg | Out-File -Force -encoding ascii -FilePath "$fullfolderPath\$pkgName.dtsx" } To run it simply change the “localhost” parameter of the server you want to connect to either by editing the script or passing it in when the script is executed. It will create the folder structure in C:\Temp (which you can also easily change if you so wish – just edit the script accordingly). Here’s the folder structure that it created for me: Notice how it is a mirror of the folder structure in [msdb]. Hope this is useful! @Jamiet UPDATE: THis post prompted Chad Miller to write a post describing his Powershell add-in that utilises a SSIS API to do exporting of packages. Go take a read here: http://sev17.com/2011/02/importing-and-exporting-ssis-packages-using-powershell/

    Read the article

  • Reading data from an Entity Framework data model through a WCF Data Service

    - by nikolaosk
    This is going to be the fourth post of a series of posts regarding ASP.Net and the Entity Framework and how we can use Entity Framework to access our datastore. You can find the first one here , the second one here and the third one here . I have a post regarding ASP.Net and EntityDataSource. You can read it here .I have 3 more posts on Profiling Entity Framework applications. You can have a look at them here , here and here . Microsoft with .Net 3.0 Framework, introduced WCF. WCF is Microsoft's...(read more)

    Read the article

  • Big Data&rsquo;s Killer App&hellip;

    - by jean-pierre.dijcks
    Recently Keith spent  some time talking about the cloud on this blog and I will spare you my thoughts on the whole thing. What I do want to write down is something about the Big Data movement and what I think is the killer app for Big Data... Where is this coming from, ok, I confess... I spent 3 days in cloud land at the Cloud Connect conference in Santa Clara and it was quite a lot of fun. One of the nice things at Cloud Connect was that there was a track dedicated to Big Data, which prompted me to some extend to write this post. What is Big Data anyways? The most valuable point made in the Big Data track was that Big Data in itself is not very cool. Doing something with Big Data is what makes all of this cool and interesting to a business user! The other good insight I got was that a lot of people think Big Data means a single gigantic monolithic system holding gazillions of bytes or documents or log files. Well turns out that most people in the Big Data track are talking about a lot of collections of smaller data sets. So rather than thinking "big = monolithic" you should be thinking "big = many data sets". This is more than just theoretical, it is actually relevant when thinking about big data and how to process it. It is important because it means that the platform that stores data will most likely consist out of multiple solutions. You may be storing logs on something like HDFS, you may store your customer information in Oracle and you may store distilled clickstream information in some distilled form in MySQL. The big question you will need to solve is not what lives where, but how to get it all together and get some value out of all that data. NoSQL and MapReduce Nope, sorry, this is not the killer app... and no I'm not saying this because my business card says Oracle and I'm therefore biased. I think language is important, but as with storage I think pragmatic is better. In other words, some questions can be answered with SQL very efficiently, others can be answered with PERL or TCL others with MR. History should teach us that anyone trying to solve a problem will use any and all tools around. For example, most data warehouses (Big Data 1.0?) get a lot of data in flat files. Everyone then runs a bunch of shell scripts to massage or verify those files and then shoves those files into the database. We've even built shell script support into external tables to allow for this. I think the Big Data projects will do the same. Some people will use MapReduce, although I would argue that things like Cascading are more interesting, some people will use Java. Some data is stored on HDFS making Cascading the way to go, some data is stored in Oracle and SQL does do a good job there. As with storage and with history, be pragmatic and use what fits and neither NoSQL nor MR will be the one and only. Also, a language, while important, does in itself not deliver business value. So while cool it is not a killer app... Vertical Behavioral Analytics This is the killer app! And you are now thinking: "what does that mean?" Let's decompose that heading. First of all, analytics. I would think you had guessed by now that this is really what I'm after, and of course you are right. But not just analytics, which has a very large scope and means many things to many people. I'm not just after Business Intelligence (analytics 1.0?) or data mining (analytics 2.0?) but I'm after something more interesting that you can only do after collecting large volumes of specific data. That all important data is about behavior. What do my customers do? More importantly why do they behave like that? If you can figure that out, you can tailor web sites, stores, products etc. to that behavior and figure out how to be successful. Today's behavior that is somewhat easily tracked is web site clicks, search patterns and all of those things that a web site or web server tracks. that is where the Big Data lives and where these patters are now emerging. Other examples however are emerging, and one of the examples used at the conference was about prediction churn for a telco based on the social network its members are a part of. That social network is not about LinkedIn or Facebook, but about who calls whom. I call you a lot, you switch provider, and I might/will switch too. And that just naturally brings me to the next word, vertical. Vertical in this context means per industry, e.g. communications or retail or government or any other vertical. The reason for being more specific than just behavioral analytics is that each industry has its own data sources, has its own quirky logic and has its own demands and priorities. Of course, the methods and some of the software will be common and some will have both retail and service industry analytics in place (your corner coffee store for example). But the gist of it all is that analytics that can predict customer behavior for a specific focused group of people in a specific industry is what makes Big Data interesting. Building a Vertical Behavioral Analysis System Well, that is going to be interesting. I have not seen much going on in that space and if I had to have some criticism on the cloud connect conference it would be the lack of concrete user cases on big data. The telco example, while a step into the vertical behavioral part is not really on big data. It used a sample of data from the customers' data warehouse. One thing I do think, and this is where I think parts of the NoSQL stuff come from, is that we will be doing this analysis where the data is. Over the past 10 years we at Oracle have called this in-database analytics. I guess we were (too) early? Now the entire market is going there including companies like SAS. In-place btw does not mean "no data movement at all", what it means that you will do this on data's permanent home. For SAS that is kind of the current problem. Most of the inputs live in a data warehouse. So why move it into SAS and back? That all worked with 1 TB data warehouses, but when we are looking at 100TB to 500 TB of distilled data... Comments? As it is still early days with these systems, I'm very interested in seeing reactions and thoughts to some of these thoughts...

    Read the article

  • Accelerate your SOA with Data Integration - Live Webinar Tuesday!

    - by dain.hansen
    Need to put wind in your SOA sails? Organizations are turning more and more to Real-time data integration to complement their Service Oriented Architecture. The benefit? Lowering costs through consolidating legacy systems, reducing risk of bad data polluting their applications, and shortening the time to deliver new service offerings. Join us on Tuesday April 13th, 11AM PST for our live webinar on the value of combining SOA and Data Integration together. In this webcast you'll learn how to innovate across your applications swiftly and at a lower cost using Oracle Data Integration technologies: Oracle Data Integrator Enterprise Edition, Oracle GoldenGate, and Oracle Data Quality. You'll also hear: Best practices for building re-usable data services that are high performing and scalable across the enterprise How real-time data integration can maximize SOA returns while providing continuous availability for your mission critical applications Architectural approaches to speed service implementation and delivery times, with pre-integrations to CRM, ERP, BI, and other packaged applications Register now for this live webinar!

    Read the article

  • What is SSIS order of data transformation component method calls

    - by Ron Ruble
    I am working on a custom data transformation component. I'm using NUnit and NMock2 to test as I code. Testing and getting the custom UI and other features right is a huge pain, in part because I can't find any documentation about the order in which SSIS invokes methods on the component at design time as well as runtime. I can correct the issues readily enough, but it's tedious and time consuming to unregister the old version, register the new version, fire up the test ssis package, try to display the UI, get an obscure error message, backtrace it, modify the component and continue. One of the big issues involves the UI component needing access to the componentmetadata and buffermanager properties of the component at design time, and what I need to provide for to support properties that won't be initialized until after the user enters them in the UI. I can work through it; but if someone knows of some docs or tips that would speed me up, I'd greatly appreciate it. The samples I've found havn't been much use; they seem to be directed to showing off cool stuff (Twitter, weather.com) rather than actual work. Thanks in advance.

    Read the article

  • Big Data – Basics of Big Data Analytics – Day 18 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the various components in Big Data Story. In this article we will understand what are the various analytics tasks we try to achieve with the Big Data and the list of the important tools in Big Data Story. When you have plenty of the data around you what is the first thing which comes to your mind? “What do all these data means?” Exactly – the same thought comes to my mind as well. I always wanted to know what all the data means and what meaningful information I can receive out of it. Most of the Big Data projects are built to retrieve various intelligence all this data contains within it. Let us take example of Facebook. When I look at my friends list of Facebook, I always want to ask many questions such as - On which date my maximum friends have a birthday? What is the most favorite film of my most of the friends so I can talk about it and engage them? What is the most liked placed to travel my friends? Which is the most disliked cousin for my friends in India and USA so when they travel, I do not take them there. There are many more questions I can think of. This illustrates that how important it is to have analysis of Big Data. Here are few of the kind of analysis listed which you can use with Big Data. Slicing and Dicing: This means breaking down your data into smaller set and understanding them one set at a time. This also helps to present various information in a variety of different user digestible ways. For example if you have data related to movies, you can use different slide and dice data in various formats like actors, movie length etc. Real Time Monitoring: This is very crucial in social media when there are any events happening and you wanted to measure the impact at the time when the event is happening. For example, if you are using twitter when there is a football match, you can watch what fans are talking about football match on twitter when the event is happening. Anomaly Predication and Modeling: If the business is running normal it is alright but if there are signs of trouble, everyone wants to know them early on the hand. Big Data analysis of various patterns can be very much helpful to predict future. Though it may not be always accurate but certain hints and signals can be very helpful. For example, lots of data can help conclude that if there is lots of rain it can increase the sell of umbrella. Text and Unstructured Data Analysis: unstructured data are now getting norm in the new world and they are a big part of the Big Data revolution. It is very important that we Extract, Transform and Load the unstructured data and make meaningful data out of it. For example, analysis of lots of images, one can predict that people like to use certain colors in certain months in their cloths. Big Data Analytics Solutions There are many different Big Data Analystics Solutions out in the market. It is impossible to list all of them so I will list a few of them over here. Tableau – This has to be one of the most popular visualization tools out in the big data market. SAS – A high performance analytics and infrastructure company IBM and Oracle – They have a range of tools for Big Data Analysis Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Data Scientist. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • SSIS Expression Tester Tool

    - by Davide Mauri
    Thanks to my friend's Doug blog I’ve found a very nice tool made by fellow MVP Darren Green which really helps to make SSIS develoepers life easier: http://expressioneditor.codeplex.com/Wikipage?ProjectName=expressioneditor In brief the tool allow the testing of SSIS Expression so that one can evaluate and test them before using in SSIS packages. Cool and useful! Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • SSIS Expression Tester Tool

    - by Davide Mauri
    Thanks to my friend's Doug blog I’ve found a very nice tool made by fellow MVP Darren Green which really helps to make SSIS develoepers life easier: http://expressioneditor.codeplex.com/Wikipage?ProjectName=expressioneditor In brief the tool allow the testing of SSIS Expression so that one can evaluate and test them before using in SSIS packages. Cool and useful! Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Announcing Four Weeks of SSIS MicroTraining!

    - by andyleonard
    For the next four Tuesdays – 29 Nov, 6 Dec, 13 Dec, and 20 Dec – I will deliver a 30 – 45 minute presentation beginning at 11:00 AM EST on Google+ . Please note Google+ limits attendance to the first ten people who join the Hangout and I have no control over who gets in. The topics will be: 29 Nov – “I See a Control Flow. Now What?” (Creating Your First SSIS Package) 6 Dec – The Incremental Load SSIS Design Pattern 13 Dec – Flat File Fu 20 Dec – SSIS Frameworks Magic Details 29 Nov – “I See a Control...(read more)

    Read the article

  • Attunity Oracle CDC Solution for SSIS - Beta

    We in no way work for Attunity but we were asked to test drive a beta version of their Oracle CDC solution for SSIS.  Everybody should know that moving more data than you need to takes too much time and uses resources that may better be employed doing something else.  Change data Capture is a technology that is designed to help you identify only the data that has had something done to it and you can therefore move only what is needed.  Microsoft have implemented this exact functionality into SQL server 2008 and I really like it there.  Attunity though are doing it on Oracle. DISCLAIMER: This is a BETA release and some of the parts are a bit ugly/difficult to work with.  The idea though is definitely right and the product once working does exactly what it says on the tin.  They have always been helpful to me when I have had a problem with the product and if that continues then beta testing pain should be eased somewhat. In due course I am going to be doing some videos around me using the product.  If you use Oracle and SSIS then give it a go. Here is their product description.   Attunity is a Microsoft SQL Server technology partner and the creator of the Microsoft Connectors for Oracle and Teradata, currently available in SQL Server 2008 Enterprise Edition. Attunity released a beta version of the Attunity Oracle-CDC for SSIS, a product that integrates continually changing Oracle data into SSIS, efficiently and in real-time. Attunity designed the product and integrated it into SSIS to create the simple creation of change data capture (CDC) solutions, accelerate implementation time, and reduce resources and costs. They also utilize log-based CDC so the solution has minimal impact on the Oracle source system. You can use the product to implement enterprise-class data replication, synchronization, and real-time business intelligence (BI) and data warehousing projects, quickly and efficiently, leveraging their existing SQL Server investments and resource skills. Attunity architected the product specifically for the Microsoft SSIS developer community and the product is available for both SQL Server 2005 and SQL Server 2008. It offers the following key capabilities: · Log-based, non-intrusive Oracle CDC · Full integration into SSIS and the Business Intelligence Developer Studio · Automatic generation of SSIS packages for CDC as well as full-loads of Oracle data · Filtering of Oracle tables and columns at the source · Monitoring and control of CDC processing Click to learn more and download the beta.

    Read the article

  • Big Data – What is Big Data – 3 Vs of Big Data – Volume, Velocity and Variety – Day 2 of 21

    - by Pinal Dave
    Data is forever. Think about it – it is indeed true. Are you using any application as it is which was built 10 years ago? Are you using any piece of hardware which was built 10 years ago? The answer is most certainly No. However, if I ask you – are you using any data which were captured 50 years ago, the answer is most certainly Yes. For example, look at the history of our nation. I am from India and we have documented history which goes back as over 1000s of year. Well, just look at our birthday data – atleast we are using it till today. Data never gets old and it is going to stay there forever.  Application which interprets and analysis data got changed but the data remained in its purest format in most cases. As organizations have grown the data associated with them also grew exponentially and today there are lots of complexity to their data. Most of the big organizations have data in multiple applications and in different formats. The data is also spread out so much that it is hard to categorize with a single algorithm or logic. The mobile revolution which we are experimenting right now has completely changed how we capture the data and build intelligent systems.  Big organizations are indeed facing challenges to keep all the data on a platform which give them a  single consistent view of their data. This unique challenge to make sense of all the data coming in from different sources and deriving the useful actionable information out of is the revolution Big Data world is facing. Defining Big Data The 3Vs that define Big Data are Variety, Velocity and Volume. Volume We currently see the exponential growth in the data storage as the data is now more than text data. We can find data in the format of videos, musics and large images on our social media channels. It is very common to have Terabytes and Petabytes of the storage system for enterprises. As the database grows the applications and architecture built to support the data needs to be reevaluated quite often. Sometimes the same data is re-evaluated with multiple angles and even though the original data is the same the new found intelligence creates explosion of the data. The big volume indeed represents Big Data. Velocity The data growth and social media explosion have changed how we look at the data. There was a time when we used to believe that data of yesterday is recent. The matter of the fact newspapers is still following that logic. However, news channels and radios have changed how fast we receive the news. Today, people reply on social media to update them with the latest happening. On social media sometimes a few seconds old messages (a tweet, status updates etc.) is not something interests users. They often discard old messages and pay attention to recent updates. The data movement is now almost real time and the update window has reduced to fractions of the seconds. This high velocity data represent Big Data. Variety Data can be stored in multiple format. For example database, excel, csv, access or for the matter of the fact, it can be stored in a simple text file. Sometimes the data is not even in the traditional format as we assume, it may be in the form of video, SMS, pdf or something we might have not thought about it. It is the need of the organization to arrange it and make it meaningful. It will be easy to do so if we have data in the same format, however it is not the case most of the time. The real world have data in many different formats and that is the challenge we need to overcome with the Big Data. This variety of the data represent  represent Big Data. Big Data in Simple Words Big Data is not just about lots of data, it is actually a concept providing an opportunity to find new insight into your existing data as well guidelines to capture and analysis your future data. It makes any business more agile and robust so it can adapt and overcome business challenges. Tomorrow In tomorrow’s blog post we will try to answer discuss Evolution of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Required Parameters [SSIS Denali]

    - by jamiet
    SQL Server Integration Services (SSIS) in its 2005 and 2008 incarnations expects you to set a property values within your package at runtime using Configurations. SSIS developers tend to have rather a lot of issues with SSIS configurations; in this blog post I am going to highlight one of those problems and how it has been alleviated in SQL Server code-named Denali.   A configuration is a property path/value pair that exists outside of a package, typically within SQL Server or in a collection of one or more configurations in a file called a .dtsConfig file. Within the package one defines a pointer to a configuration that says to the package “When you execute, go and get a configuration value from this location” and if all goes well the package will fetch that configuration value as it starts to execute and you will see something like the following in your output log: Information: 0x40016041 at Package: The package is attempting to configure from the XML file "C:\Configs\MyConfig.dtsConfig". Unfortunately things DON’T always go well, perhaps the .dtsConfig file is unreachable or the name of the SQL Sever holding the configuration value has been defined incorrectly – any one of a number of things can go wrong. In this circumstance you might see something like the following in your log output instead: Warning: 0x80012014 at Package: The configuration file "C:\Configs\MyConfig.dtsConfig" cannot be found. Check the directory and file name. The problem that I want to draw attention to here though is that your package will ignore the fact it can’t find the configuration and executes anyway. This is really really bad because the package will not be doing what it is supposed to do and worse, if you have not isolated your environments you might not even know about it. Can you imagine a package executing for months and all the while inserting data into the wrong server? Sounds ridiculous but I have absolutely seen this happen and the root cause was that no-one picked up on configuration warnings like the one above. Happily in SSIS code-named Denali this problem has gone away as configurations have been replaced with parameters. Each parameter has a property called ‘Required’: Any parameter with Required=True must have a value passed to it when the package executes. Any attempt to execute the package will result in an error. Here we see that error when attempting to execute using the SSMS UI: and similarly when executing using T-SQL: Error is: Msg 27184, Level 16, State 1, Procedure prepare_execution, Line 112 In order to execute this package, you need to specify values for the required parameters.   As you can see, SSIS code-named Denali has mechanisms built-in to prevent the problem I described at the top of this blog post. Specifying a Parameter required means that any packages in that project cannot execute until a value for the parameter has been supplied. This is a very good thing. I am loathe to make recommendations so early in the development cycle but right now I’m thinking that all Project Parameters should have Required=True, certainly any that are used to define external locations should be anyway. @Jamiet

    Read the article

  • Have SSIS' differing type systems ever caused you problems?

    - by jamiet
    One thing that has always infuriated me about SSIS is the fact that every package has three different type systems; to give you an idea of what I am talking about consider the following: The SSIS dataflow's type system is made up of types called DT_*  (e.g. DT_STR, DT_I4) The SSIS variable type system is based on .Net datatypes (e.g. String, Int32) The types available for Execute SQL Task's parameters are based on something else - I don't exactly know what (e.g. VARCHAR, LONG) Speaking euphemistically ... this is not an optimum situation (were I not speaking euphemistically I would be a lot ruder) and hence I have submitted a suggestion to Connect at [SSIS] Consolidate three type systems into one requesting that it be remedied. This accompanying blog post is not however a request for votes (though that would be nice); the reason is actually subtler than that. Let me explain. I have been submitting bugs and suggestions pertaining to SSIS for years and have, so far, submitted over 200 Connect items. If that experience has taught me anything it is this - Connect items are not generally actioned because they are considered "nice to have". No, SSIS Connect items get actioned because they cause customers grief and if I am perfectly honest I must admit that, other than being a bit gnarly, SSIS' three type system architecture has never knowingly caused me any significant problems. The reason for this blog post is to ask if any reader out there has ever encountered any problems on account of SSIS' three type systems or have you, like me, never found them to be a problem? Errors or performance degredation caused by implicit type conversions would, I believe, present a strong case for getting this situation remedied in a future version of SSIS so if you HAVE encountered such problems I would encourage you to leave a comment on the Connect submission accordingly. Let me know in the comments too - I would be interested to hear others' opinions on this. @Jamiet

    Read the article

  • The Data Scientist

    - by BuckWoody
    A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So  watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

    Read the article

  • SSIS - Connect to Oracle on a 64-bit machine (Updated for SSIS 2008 R2)

    - by jorg
    We recently had a few customers where a connection to Oracle on a 64 bit machine was necessary. A quick search on the internet showed that this could be a big problem. I found all kind of blog and forum posts of developers complaining about this. A lot of developers will recognize the following error message: Test connection failed because of an error in initializing provider. Oracle client and networking components were not found. These components are supplied by Oracle Corporation and are part of the Oracle Version 7.3.3 or later client software installation. Provider is unable to function until these components are installed. After a lot of searching, trying and debugging I think I found the right way to do it! Problems Because BIDS is a 32 bit application, as well on 32 as on 64 bit machines, it cannot see the 64 bit driver for Oracle. Because of this, connecting to Oracle from BIDS on a 64 bit machine will never work when you install the 64 bit Oracle client. Another problem is the "Microsoft Provider for Oracle", this driver only exists in a 32 bit version and Microsoft has no plans to create a 64 bit one in the near future. The last problem I know of is in the Oracle client itself, it seems that a connection will never work with the instant client, so always use the full client. There are also a lot of problems with the 10G client, one of it is the fact that this driver can't handle the "(x86)" in the path of SQL Server. So using the 10G client is no option! Solution Download the Oracle 11G full client. Install the 32 AND the 64 bit version of the 11G full client (Installation Type: Administrator) and reboot the server afterwards. The 32 bit version is needed for development from BIDS with is 32 bit, the 64 bit version is needed for production with the SQLAgent, which is 64 bit. Configure the Oracle clients (both 32 and 64 bits) by editing  the files tnsnames.ora and sqlnet.ora. Try to do this with an Oracle DBA or, even better, let him/her do this. Use the "Oracle provider for OLE DB" from SSIS, don't use the "Microsoft Provider for Oracle" because a 64 bit version of it does not exist. Schedule your packages with the SQLAgent. Background information Visual Studio (BI Dev Studio)is a 32bit application. SQL Server Management Studio is a 32bit application. dtexecui.exe is a 32bit application. dtexec.exe has both 32bit and 64bit versions. There are x64 and x86 versions of the Oracle provider available. SQLAgent is a 64bit process. My advice to BI consultants is to get an Oracle DBA or professional for the installation and configuration of the 2 full clients (32 and 64 bit). Tell the DBA to download the biggest client available, this way you are sure that they pick the right one ;-) Testing if the clients have been installed and configured in the right way can be done with Windows ODBC Data Source Administrator: Start... Programs... Administrative tools... Data Sources (ODBC) ADITIONAL STEPS FOR SSIS 2008 R2 It seems that, unfortunately, some additional steps are necessary for SQL Server 2008 R2 installations: 1. Open REGEDIT (Start… Run… REGEDIT) on the server and search for the following entry (for the 32 bits driver): HKEY_LOCAL_MACHINE\Software\Microsoft\MSDTC\MTxOCI Make sure the following values are entered: 2. Next, search for (for the 64 bits driver): HKEY_LOCAL_MACHINE\Software\Wow6432Node\Microsoft\MSDTC\MTxOCI Make sure the same values as above are entered. 3. Reboot your server.

    Read the article

  • SSIS code smell – Unused columns in the dataflow

    - by jamiet
    A code smell is defined on Wikipedia as being a “symptom in the source code of a program that possibly indicates a deeper problem”. It’s a term commonly used by our code-writing brethren to describe sub-optimal code but I think the term can be applied equally well to SSIS packages too as I shall now explain One of my pet hates about SSIS development is packages that throw warnings of the form: The output column "ColumnName" (1358) on output "OLE DB Source Output" (1289) and component "OLE_SRC Name" (1279) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.  The warning is fairly self-explanatory – any column that appears in the data flow but doesn’t get used will throw this warning when the data flow is executed. Its not the negligible performance degradation that they cause that bothers me though, it’s the clutter that they cause in your log file/table. Take a look at the following screenshot if you don’t believe me: There are 231409 such warnings in the system that I took this screenshot from, that is 231409 log records that should not be there. The most infuriating thing about this warning is that it is so easily avoidable; eliminating such columns is a very quick and easy thing to do in the SSIS Designer. The only problem I see is that the warnings don’t occur until you execute the package – it would be preferable for the designer to have an unobtrusive way of informing you of them as well. Anyway, I digress… I consider such warnings to be a code smell because, to me, they’re symptomatic of a lack of due care and attention; a lack of developer discipline if you will. What other code smells can you think of when building SSIS packages? If I get a good list in the comments maybe I’ll compile them into a later blog post. @Jamiet Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Fraud Detection with the SQL Server Suite Part 2

    - by Dejan Sarka
    This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

    Read the article

  • Learn SSIS in London 12-14 Sep 2012!

    - by andyleonard
    My friends at TechniTrain , the students, and I had a blast during the March 2012 London delivery of From Zero To SSIS ! We have decided to do it again in September 2012 with my new Learning SSIS 2012 3-day course ! Please find a course outline here . It is difficult to list everything I cover in the course, but the outline hits the high spots. This material grew out of my experiences serving as a consultant on short-term engagements and as a manager (and enterprise ETL architect) for a team of forty...(read more)

    Read the article

  • Learn SSIS in London 12-14 Sep 2012!

    - by andyleonard
    My friends at TechniTrain , the students, and I had a blast during the March 2012 London delivery of From Zero To SSIS ! We have decided to do it again in September 2012 with my new Learning SSIS 2012 3-day course ! Please find a course outline here . It is difficult to list everything I cover in the course, but the outline hits the high spots. This material grew out of my experiences serving as a consultant on short-term engagements and as a manager (and enterprise ETL architect) for a team of forty...(read more)

    Read the article

  • Developing a Custom SSIS Source Component

    SSIS was designed to be extensible. Although you can create tasks that will take data from a wide variety of sources, transform the data is a number of ways and write the results a wide choice of destinations, using the components provided, there will always be occasions when you need to customise your own SSIS component. Yes, it is time to hone up your C# skills and cut some code, as Saurabh explains.

    Read the article

  • SQL Server 2008 - Script Data as Insert Statements from SSIS Package

    - by Brandon King
    SQL Server 2008 provides the ability to script data as Insert statements using the Generate Scripts option in Management Studio. Is it possible to access the same functionality from within a SSIS package? Here's what I'm trying to accomplish... I have a scheduled job that nightly scripts out all the schema and data for a SQL Server 2008 database and then uses the script to create a "mirror copy" SQLCE 3.5 database. I've been using Narayana Vyas Kondreddi's sp_generate_inserts stored procedure to accomplish this, but it has problems with some datatypes and greater-than-4K columns (holdovers from SQL Server 2000 days). The Script Data function looks like it could solve my problems, if only I could automate it. Any suggestions?

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >