Search Results

Search found 58774 results on 2351 pages for 'ssis data tranformations'.

Page 6/2351 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

Why I don't use SSIS checkpoint files

- by jamiet

In a recent discussion in regard to general ETL best practises the subject of checkpoint files as a means for package restartability came up and I stated that I was dead against using them. For anyone that may care, here is why: Configuring them is distinctly unintuitive (that's a matter of opinion but if you follow the link I'll wager that you will agree) they don't make any allowance for loop iterations they cannot store variables of type Object they are limited in ability. There are many scenarios where you may want to execute certain containers regardless of whether the package is started from a checkpoint file but the current usage model does not allow for this. they are ignored by eventhandlers, which wouldn't be so bad if there were a way to toggle this behaviour in certain scenarios they dont work properly I'll expand on the last bullet point. I have encountered situations where the behaviour for tasks executing concurrently is unpredictable. That is, sometimes the completion of a task that executes concurrently with a failed/failing task will make it into the checkpoint file and sometimes it won't. This is near-impossible to reproduce but it does happen as my good friend John Welch will hopefully concur (if he is reading). Is anyone out there making successful use of checkpoint files within SSIS? I would be interested in knowing about that if so. @Jamiet

Read the article
Execute a SSIS package in Sync or Async mode from SQL Server 2012

- by Davide Mauri

Today I had to schedule a package stored in the shiny new SSIS Catalog store that can be enabled with SQL Server 2012. (http://msdn.microsoft.com/en-us/library/hh479588(v=SQL.110).aspx) Once your packages are stored here, they will be executed using the new stored procedures created for this purpose. This is the script that will get executed if you try to execute your packages right from management studio or through a SQL Server Agent job, will be similar to the following: Declare @execution_id bigint EXEC [SSISDB].[catalog].[create_execution] @package_name='my_package.dtsx', @execution_id=@execution_id OUTPUT, @folder_name=N'BI', @project_name=N'DWH', @use32bitruntime=False, @reference_id=Null Select @execution_id DECLARE @var0 smallint = 1 EXEC [SSISDB].[catalog].[set_execution_parameter_value] @execution_id, @object_type=50, @parameter_name=N'LOGGING_LEVEL', @parameter_value=@var0 DECLARE @var1 bit = 0 EXEC [SSISDB].[catalog].[set_execution_parameter_value] @execution_id, @object_type=50, @parameter_name=N'DUMP_ON_ERROR', @parameter_value=@var1 EXEC [SSISDB].[catalog].[start_execution] @execution_id GO The problem here is that the procedure will simply start the execution of the package and will return as soon as the package as been started…thus giving you the opportunity to execute packages asynchrously from your T-SQL code. This is just *great*, but what happens if I what to execute a package and WAIT for it to finish (and thus having a synchronous execution of it)? You have to be sure that you add the “SYNCHRONIZED” parameter to the package execution. Before the start_execution procedure: exec [SSISDB].[catalog].[set_execution_parameter_value] @execution_id, @object_type=50, @parameter_name=N'SYNCHRONIZED', @parameter_value=1 And that’s it . PS From the RC0, the SYNCHRONIZED parameter is automatically added each time you schedule a package execution through the SQL Server Agent. If you’re using an external scheduler, just keep this post in mind .

Read the article
Big Data – Operational Databases Supporting Big Data – Columnar, Graph and Spatial Database – Day 14 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Key-Value Pair Databases and Document Databases in the Big Data Story. In this article we will understand the role of Columnar, Graph and Spatial Database supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (The day before yesterday’s post) NoSQL Databases (The day before yesterday’s post) Key-Value Pair Databases (Yesterday’s post) Document Databases (Yesterday’s post) Columnar Databases (Tomorrow’s post) Graph Databases (Today’s post) Spatial Databases (Today’s post) Columnar Databases Relational Database is a row store database or a row oriented database. Columnar databases are column oriented or column store databases. As we discussed earlier in Big Data we have different kinds of data and we need to store different kinds of data in the database. When we have columnar database it is very easy to do so as we can just add a new column to the columnar database. HBase is one of the most popular columnar databases. It uses Hadoop file system and MapReduce for its core data storage. However, remember this is not a good solution for every application. This is particularly good for the database where there is high volume incremental data is gathered and processed. Graph Databases For a highly interconnected data it is suitable to use Graph Database. This database has node relationship structure. Nodes and relationships contain a Key Value Pair where data is stored. The major advantage of this database is that it supports faster navigation among various relationships. For example, Facebook uses a graph database to list and demonstrate various relationships between users. Neo4J is one of the most popular open source graph database. One of the major dis-advantage of the Graph Database is that it is not possible to self-reference (self joins in the RDBMS terms) and there might be real world scenarios where this might be required and graph database does not support it. Spatial Databases We all use Foursquare, Google+ as well Facebook Check-ins for location aware check-ins. All the location aware applications figure out the position of the phone with the help of Global Positioning System (GPS). Think about it, so many different users at different location in the world and checking-in all together. Additionally, the applications now feature reach and users are demanding more and more information from them, for example like movies, coffee shop or places see. They are all running with the help of Spatial Databases. Spatial data are standardize by the Open Geospatial Consortium known as OGC. Spatial data helps answering many interesting questions like “Distance between two locations, area of interesting places etc.” When we think of it, it is very clear that handing spatial data and returning meaningful result is one big task when there are millions of users moving dynamically from one place to another place & requesting various spatial information. PostGIS/OpenGIS suite is very popular spatial database. It runs as a layer implementation on the RDBMS PostgreSQL. This makes it totally unique as it offers best from both the worlds. Courtesy: mushroom network Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Hive. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Can't save data for a member in a data form

- by RahulS

Implied sharing is an old thing everyone knows the reasons and solutions of that, still little theory about that: With Essbase implied sharing, some members are shared even if you do not explicitly set them as shared. These members are implied shared members. When an implied share relationship is created, each implied member assumes the other member’s value. Essbase assumes (or implies) a shared member relationship in these situations: 1. A parent has only one child 2. A parent has only one child that consolidates to the parent In a Planning form that contains members with an implied sharing relationship, when a value is added for the parent, the child assumes the same value after the form is saved. Likewise, if a value is added for the child, the parent usually assumes the same value after a form is saved.For example, when a calculation script or load rule populates an implied share member, the other implied share member assumes the value of the member populated by the calculation script or load rule. The last value calculated or imported takes precedence. The result is the same whether you refer to the parent or the child as a variable in a calculation script. For more information have a look at: http://docs.oracle.com/cd/E17236_01/epm.1112/hp_admin_11122/ch14s11.html Now the issue which we are going to talk about is We loose data on save even when the parent is dynamic calc and has a single child. A dynamic calc parent to a single child: If we design the form with following selection: In the data form we will find parent below the member and this is by design whenever you make a selection using commands to select all the member below parent, always children will appear before the parent: Lets try to enter data, Save it Now, try to change the way we selected members Here we go: Now the question again why this behavior: 1. Data from Planning data form passes to Essbase row by row, 2. Because in data form the child member appears before the parent, 3. First, data goes to Essbase for child (SingleStoreChild), 4. Then when Planning passes the data for parent there was #Missing or No data, 5. Over writes the data to #missing. PS: As we know that dynamic calc members are calculated on the fly they are not allocated with any memory in the Essbase, here the parent was dynamic calc and it was pointing to same memory as child in the background, when Planning was passing data to Essbase for second row it has updated the child with missing data.(Little confusing, let me know if you need more explanation) 6. As one of the solutions just change the order of appearance of parent and child. Cheers..!!! Rahul S. https://www.facebook.com/pages/HyperionPlanning/117320818374228

Read the article
Building dynamic OLAP data marts on-the-fly

- by DrJohn

At the forthcoming SQLBits conference, I will be presenting a session on how to dynamically build an OLAP data mart on-the-fly. This blog entry is intended to clarify exactly what I mean by an OLAP data mart, why you may need to build them on-the-fly and finally outline the steps needed to build them dynamically. In subsequent blog entries, I will present exactly how to implement some of the techniques involved. What is an OLAP data mart? In data warehousing parlance, a data mart is a subset of the overall corporate data provided to business users to meet specific business needs. Of course, the term does not specify the technology involved, so I coined the term "OLAP data mart" to identify a subset of data which is delivered in the form of an OLAP cube which may be accompanied by the relational database upon which it was built. To clarify, the relational database is specifically create and loaded with the subset of data and then the OLAP cube is built and processed to make the data available to the end-users via standard OLAP client tools. Why build OLAP data marts? Market research companies sell data to their clients to make money. To gain competitive advantage, market research providers like to "add value" to their data by providing systems that enhance analytics, thereby allowing clients to make best use of the data. As such, OLAP cubes have become a standard way of delivering added value to clients. They can be built on-the-fly to hold specific data sets and meet particular needs and then hosted on a secure intranet site for remote access, or shipped to clients' own infrastructure for hosting. Even better, they support a wide range of different tools for analytical purposes, including the ever popular Microsoft Excel. Extension Attributes: The Challenge One of the key challenges in building multiple OLAP data marts based on the same 'template' is handling extension attributes. These are attributes that meet the client's specific reporting needs, but do not form part of the standard template. Now clearly, these extension attributes have to come into the system via additional files and ultimately be added to relational tables so they can end up in the OLAP cube. However, processing these files and filling dynamically altered tables with SSIS is a challenge as SSIS packages tend to break as soon as the database schema changes. There are two approaches to this: (1) dynamically build an SSIS package in memory to match the new database schema using C#, or (2) have the extension attributes provided as name/value pairs so the file's schema does not change and can easily be loaded using SSIS. The problem with the first approach is the complexity of writing an awful lot of complex C# code. The problem of the second approach is that name/value pairs are useless to an OLAP cube; so they have to be pivoted back into a proper relational table somewhere in the data load process WITHOUT breaking SSIS. How this can be done will be part of future blog entry. What is involved in building an OLAP data mart? There are a great many steps involved in building OLAP data marts on-the-fly. The key point is that all the steps must be automated to allow for the production of multiple OLAP data marts per day (i.e. many thousands, each with its own specific data set and attributes). Now most of these steps have a great deal in common with standard data warehouse practices. The key difference is that the databases are all built to order. The only permanent database is the metadata database (shown in orange) which holds all the metadata needed to build everything else (i.e. client orders, configuration information, connection strings, client specific requirements and attributes etc.). The staging database (shown in red) has a short life: it is built, populated and then ripped down as soon as the OLAP Data Mart has been populated. In the diagram below, the OLAP data mart comprises the two blue components: the Data Mart which is a relational database and the OLAP Cube which is an OLAP database implemented using Microsoft Analysis Services (SSAS). The client may receive just the OLAP cube or both components together depending on their reporting requirements. So, in broad terms the steps required to fulfil a client order are as follows: Step 1: Prepare metadata Create a set of database names unique to the client's order Modify all package connection strings to be used by SSIS to point to new databases and file locations. Step 2: Create relational databases Create the staging and data mart relational databases using dynamic SQL and set the database recovery mode to SIMPLE as we do not need the overhead of logging anything Execute SQL scripts to build all database objects (tables, views, functions and stored procedures) in the two databases Step 3: Load staging database Use SSIS to load all data files into the staging database in a parallel operation Load extension files containing name/value pairs. These will provide client-specific attributes in the OLAP cube. Step 4: Load data mart relational database Load the data from staging into the data mart relational database, again in parallel where possible Allocate surrogate keys and use SSIS to perform surrogate key lookup during the load of fact tables Step 5: Load extension tables & attributes Pivot the extension attributes from their native name/value pairs into proper relational tables Add the extension attributes to the views used by OLAP cube Step 6: Deploy & Process OLAP cube Deploy the OLAP database directly to the server using a C# script task in SSIS Modify the connection string used by the OLAP cube to point to the data mart relational database Modify the cube structure to add the extension attributes to both the data source view and the relevant dimensions Remove any standard attributes that not required Process the OLAP cube Step 7: Backup and drop databases Drop staging database as it is no longer required Backup data mart relational and OLAP database and ship these to the client's infrastructure Drop data mart relational and OLAP database from the build server Mark order complete Start processing the next order, ad infinitum. So my future blog posts and my forthcoming session at the SQLBits conference will all focus on some of the more interesting aspects of building OLAP data marts on-the-fly such as handling the load of extension attributes and how to dynamically alter the structure of an OLAP cube using C#.

Read the article
Is Data Science “Science”?

- by BuckWoody

I hold the term “science” in very high esteem. I grew up on the Space Coast in Florida, and eventually worked at the Kennedy Space Center, surrounded by very intelligent people who worked in various scientific fields. Recently a new term has entered the computing dialog – “Data Scientist”. Since it’s not a standard term, it has a lot of definitions, and in fact has been disputed as a correct term. After all, the reasoning goes, if there’s no such thing as “Data Science” then how can there be a Data Scientist? This argument has been made before, albeit with a different term – “Computer Science”. In Peter Denning’s excellent article “Is Computer Science Science” (April 2005/Vol. 48, No. 4 COMMUNICATIONS OF THE ACM) there are many points that separate “science” from “engineering” and even “art”. I won’t repeat the content of that article here (I recommend you read it on your own) but will leverage the points he makes there. Definition of Science To ask the question “is data science ‘science’” then we need to start with a definition of terms. Various references put the definition into the same basic areas: Study of the physical world Systematic and/or disciplined study of a subject area ...and then they include the things studied, the bodies of knowledge and so on. The word itself comes from Latin, and means merely “to know” or “to study to know”. Greek divides knowledge further into “truth” (episteme), and practical use or effects (tekhne). Normally computing falls into the second realm. Definition of Data Science And now a more controversial definition: Data Science. This term is so new and perhaps so niche that the major dictionaries haven’t yet picked it up (my OED reference is older – can’t afford to pop for the online registration at present). Researching the term's general use I created an amalgam of the definitions this way: “Studying and applying mathematical and other techniques to derive information from complex data sets.” Using this definition, data science certainly seems to be science - it's learning about and studying some object or area using systematic methods. But implicit within the definition is the word “application”, which makes the process more akin to engineering or even technology than science. In fact, I find that using these techniques – and data itself – part of science, not science itself. I leave out the concept of studying data patterns or algorithms as part of this discipline. That is actually a domain I see within research, mathematics or computer science. That of course is a type of science, but does not seek for practical applications. As part of the argument against calling it “Data Science”, some point to the scientific method of creating a hypothesis, testing with controls, testing results against the hypothesis, and documenting for repeatability. These are not steps that we often take in working with data. We normally start with a question, and fit patterns and algorithms to predict outcomes and find correlations. In this way Data Science is more akin to statistics (and in fact makes heavy use of them) in the process rather than starting with an assumption and following on with it. So, is Data Science “Science”? I’m uncertain – and I’m uncertain it matters. Even if we are facing rampant “title inflation” these days (does anyone introduce themselves as a secretary or supervisor anymore?) I can tolerate the term at least from the intent that we use data to study problems across a wide spectrum, rather than restricting it to a single domain. And I also understand those who have worked hard to achieve the very honorable title of “scientist” who have issues with those who borrow the term without asking. What do you think? Science, or not? Does it matter?

Read the article
Data Modeling Resources

- by Dejan Sarka

You can find many different data modeling resources. It is impossible to list all of them. I selected only the most valuable ones for me, and, of course, the ones I contributed to. Books Chris J. Date: An Introduction to Database Systems – IMO a “must” to understand the relational model correctly. Terry Halpin, Tony Morgan: Information Modeling and Relational Databases – meet the object-role modeling leaders. Chris J. Date, Nikos Lorentzos and Hugh Darwen: Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL – all theory needed to manage temporal data. Louis Davidson, Jessica M. Moss: Pro SQL Server 2012 Relational Database Design and Implementation – the best SQL Server focused data modeling book I know by two of my friends. Dejan Sarka, et al.: MCITP Self-Paced Training Kit (Exam 70-441): Designing Database Solutions by Using Microsoft® SQL Server™ 2005 – SQL Server 2005 data modeling training kit. Most of the text is still valid for SQL Server 2008, 2008 R2, 2012 and 2014. Itzik Ben-Gan, Lubor Kollar, Dejan Sarka, Steve Kass: Inside Microsoft SQL Server 2008 T-SQL Querying – Steve wrote a chapter with mathematical background, and I added a chapter with theoretical introduction to the relational model. Itzik Ben-Gan, Dejan Sarka, Roger Wolter, Greg Low, Ed Katibah, Isaac Kunen: Inside Microsoft SQL Server 2008 T-SQL Programming – I added three chapters with theoretical introduction and practical solutions for the user-defined data types, dynamic schema and temporal data. Dejan Sarka, Matija Lah, Grega Jerkic: Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft SQL Server 2012 – my first two chapters are about data warehouse design and implementation. Courses Data Modeling Essentials – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. Logical and Physical Modeling for Analytical Applications – online course I wrote for Pluralsight. Working with Temporal data in SQL Server – my latest Pluralsight course, where besides theory and implementation I introduce many original ways how to optimize temporal queries. Forthcoming presentations SQL Bits 12, July 17th – 19th, Telford, UK – I have a full-day pre-conference seminar Advanced Data Modeling Topics there.

Read the article
Extending SSIS with custom Data Flow components (Presentation)

Download the slides and sample code from my Extending SSIS with custom Data Flow components presentation, first presented at the SQLBits II (The SQL) Community Conference. Abstract Get some real-world insights into developing data flow components for SSIS. This starts with an introduction to the data flow pipeline engine, and explains the real differences between adapters and the three sub-types of transformation. Understanding how the different types of component behave and manage data is key to writing components of your own, and probably should but be required knowledge for anyone building packages at all. Using sample code throughout, I will show you how to write components, as well as highlighting best practice and lessons learned. The sample code includes fully working example projects for source, destination and transformation components. Presentation & Samples (358KB) Extending SSIS with custom Data Flow components.zip

Read the article
Deploying Data Mining Models using Model Export and Import

- by [email protected]

In this post, we'll take a look at how Oracle Data Mining facilitates model deployment. After building and testing models, a next step is often putting your data mining model into a production system -- referred to as model deployment. The ability to move data mining model(s) easily into a production system can greatly speed model deployment, and reduce the overall cost. Since Oracle Data Mining provides models as first class database objects, models can be manipulated using familiar database techniques and technology. For example, one or more models can be exported to a flat file, similar to a database table dump file (.dmp). This file can be moved to a different instance of Oracle Database EE, and then imported. All methods for exporting and importing models are based on Oracle Data Pump technology and found in the DBMS_DATA_MINING package. Before performing the actual export or import, a directory object must be created. A directory object is a logical name in the database for a physical directory on the host computer. Read/write access to a directory object is necessary to access the host computer file system from within Oracle Database. For our example, we'll work in the DMUSER schema. First, DMUSER requires the privilege to create any directory. This is often granted through the sysdba account. grant create any directory to dmuser; Now, DMUSER can create the directory object specifying the path where the exported model file (.dmp) should be placed. In this case, on a linux machine, we have the directory /scratch/oracle. CREATE OR REPLACE DIRECTORY dmdir AS '/scratch/oracle'; If you aren't sure of the exact name of the model or models to export, you can find the list of models using the following query: select model_name from user_mining_models; There are several options when exporting models. We can export a single model, multiple models, or all models in a schema using the following procedure calls: BEGIN DBMS_DATA_MINING.EXPORT_MODEL ('MY_MODEL.dmp','dmdir','name =''MY_DT_MODEL'''); END; BEGIN DBMS_DATA_MINING.EXPORT_MODEL ('MY_MODELS.dmp','dmdir', 'name IN (''MY_DT_MODEL'',''MY_KM_MODEL'')'); END; BEGIN DBMS_DATA_MINING.EXPORT_MODEL ('ALL_DMUSER_MODELS.dmp','dmdir'); END; A .dmp file can be imported into another schema or database using the following procedure call, for example: BEGIN DBMS_DATA_MINING.IMPORT_MODEL('MY_MODELS.dmp', 'dmdir'); END; As with models from any data mining tool, when moving a model from one environment to another, care needs to be taken to ensure the transformations that prepare the data for model building are matched (with appropriate parameters and statistics) in the system where the model is deployed. Oracle Data Mining provides automatic data preparation (ADP) and embedded data preparation (EDP) to reduce, or possibly eliminate, the need to explicitly transport transformations with the model. In the case of ADP, ODM automatically prepares the data and includes the necessary transformations in the model itself. In the case of EDP, users can associate their own transformations with attributes of a model. These transformations are automatically applied when applying the model to data, i.e., scoring. Exporting and importing a model with ADP or EDP results in these transformations being immediately available with the model in the production system.

Read the article
Using Find/Replace with regular expressions inside a SSIS package

- by jamiet

Another one of those might-be-useful-again-one-day-so-I’ll-share-it-in-a-blog-post blog posts I am currently working on a SQL Server Integration Services (SSIS) 2012 implementation where each package contains a parameter called ETLIfcHist_ID: During normal execution this will get altered when the package is executed from the Execute Package Task however we want to make sure that at deployment-time they all have a default value of –1. Of course, they tend to get changed during development so I wanted a way of easily changing them all back to the default value. Opening up each package in turn and editing them was an option but given that we have over 40 packages and we might want to carry out this reset fairly frequently I needed a more automated method so I turned to Visual Studio’s Find/Replace… feature Of course, we don’t know what value will be in that parameter so I can’t simply search for a particular value; hence I opted to use a regular expression to identify the value to be change. In the rest of this blog post I’ll explain how to do that. For demonstration purposes I have taken the contents of a .dtsx file and stripped out everything except the element containing the parameters (<DTS:PackageParameters>), if you want to play along at home you can copy-paste the XML document below into a new XML file and open it up in Visual Studio: <?xml version="1.0"?> <DTS:Executable xmlns:DTS="www.microsoft.com/SqlServer/Dts"> <DTS:PackageParameters> <DTS:PackageParameter DTS:CreationName="" DTS:DataType="3" DTS:Description="InterfaceHistory_ID: used for Lineage" DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25846C7E}" DTS:ObjectName="ETLIfcHist_ID"> <DTS:Property DTS:DataType="3" DTS:Name="ParameterValue">VALUE_TO_BE_CHANGED</DTS:Property> </DTS:PackageParameter> <DTS:PackageParameter DTS:CreationName="" DTS:DataType="3" DTS:Description="Some other description" DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25845C7E}" DTS:ObjectName="SomeOtherObjectName"> <DTS:Property DTS:DataType="3" DTS:Name="ParameterValue">SomeOtherValue</DTS:Property> </DTS:PackageParameter> </DTS:PackageParameters> </DTS:Executable> We are trying to identify the value of the parameter whose name is ETLIfcHist_ID – notice that in the XML document above that value is VALUE_TO_BE_CHANGED. The following regular expression will find the appropriate portion of the XML document: {\<DTS\:PackageParameter[\n ]*DTS\:CreationName="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:DataType="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:Description="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:DTSID="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:ObjectName="ETLIfcHist_ID"\>[\n ]*\<DTS\:Property[\n ]*DTS\:DataType="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:Name="ParameterValue"\>}[A-Za-z0-9\:_\{\}- ]*{\<\/DTS\:Property\>} I have highlighted the name of the parameter that we’re looking for. I have also highlighted two portions identified by pairs of curly braces “{…}”; these are important because they pick out the two portions either side of the value I want to replace, in other words the portions highlighted here: <DTS:PackageParameters> <DTS:PackageParameter DTS:CreationName="" DTS:DataType="3" DTS:Description="InterfaceHistory_ID: used for Lineage" DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25846C7E}" DTS:ObjectName="ETLIfcHist_ID"> <DTS:Property DTS:DataType="3" DTS:Name="ParameterValue">VALUE_TO_BE_CHANGED</DTS:Property> </DTS:PackageParameter> Those sections in the curly braces are termed tag expressions and can be identified in the replace expression using a backslash and a number identifying which tag expression you’re referring to according to its ordinal position. Hence, our replace expression is simply: \1-1\2 We’re saying the portion of our file identified by the regular expression should be replaced by the first curly brace section, then the literal –1, then the second curly brace section. Make sense? Give it a go yourself by plugging those two expressions into Visual Studio’s Find and Replace dialog. If you set it to look in “All Open Documents” then you can open up the code-behind of all your packages and change all of them at once. The Find and Replace dialog will look like this: That’s it! I realise that not everyone will be looking to change the value of a parameter but hopefully I have shown you a technique that you can modify to work for your own scenario. Given that this blog post is, y’know, on the web I have no doubt that someone is going to find a fault with my find regex expression and if that person is you….that’s OK. Let me know about it in the comments below and perhaps we can work together to come up with something better! Note that some parameters may have a different set of properties (for example some, but not all, of my parameters have a DTS:Required attribute) so your find regular expression may have to change accordingly. When researching this I found the following article to be invaluable: Visual Studio Find/Replace Regular Expression Usage @Jamiet

Read the article
SQL SERVER – Data and Page Compressions – Data Storage and IO Improvement

- by pinaldave

The performance of SQL Server is primarily decided by the disk I/O efficiency. Improving I/O definitely improves the performance. SQL Server 2008 introduced Data and Backup compression features to improve the disk I/O. Here, I will explain Data compression. Data compression implies the reduction in the disk space reserved by data. Therefore, data compression can be [...]

Read the article
Why Oracle Data Integrator for Big Data?

- by Mala Narasimharajan

Big Data is everywhere these days - but what exactly is it? It’s data that comes from a multitude of sources – not only structured data, but unstructured data as well. The sheer volume of data is mindboggling – here are a few examples of big data: climate information collected from sensors, social media information, digital pictures, log files, online video files, medical records or online transaction records. These are just a few examples of what constitutes big data. Embedded in big data is tremendous value and being able to manipulate, load, transform and analyze big data is key to enhancing productivity and competitiveness. The value of big data lies in its propensity for greater in-depth analysis and data segmentation -- in turn giving companies detailed information on product performance, customer preferences and inventory. Furthermore, by being able to store and create more data in digital form, “big data can unlock significant value by making information transparent and usable at much higher frequency." (McKinsey Global Institute, May 2011) Oracle's flagship product for bulk data movement and transformation, Oracle Data Integrator, is a critical component of Oracle’s Big Data strategy. ODI provides automation, bulk loading, and validation and transformation capabilities for Big Data while minimizing the complexities of using Hadoop. Specifically, the advantages of ODI in a Big Data scenario are due to pre-built Knowledge Modules that drive processing in Hadoop. This leverages the graphical UI to load and unload data from Hadoop, perform data validations and create mapping expressions for transformations. The Knowledge Modules provide a key jump-start and eliminate a significant amount of Hadoop development. Using Oracle Data Integrator together with Oracle Big Data Connectors, you can simplify the complexities of mapping, accessing, and loading big data (via NoSQL or HDFS) but also correlating your enterprise data – this correlation may require integrating across heterogeneous and standards-based environments, connecting to Oracle Exadata, or sourcing via a big data platform such as Oracle Big Data Appliance. To learn more about Oracle Data Integration and Big Data, download our resource kit to see the latest in whitepapers, webinars, downloads, and more… or go to our website on www.oracle.com/bigdata

Read the article
General Policies and Procedures for Maintaining the Value of Data Assets

Here is a general list for policies and procedures regarding maintaining the value of data assets. Data Backup Policies and Procedures Backups are very important when dealing with data because there is always the chance of losing data due to faulty hardware or a user activity. So the need for a strategic backup system should be mandatory for all companies. This being said, in the real world some companies that I have worked for do not really have a good data backup plan. Typically when companies tend to take this kind of approach in data backups usually the data is not really recoverable. Unfortunately when companies do not regularly test their backup plans they get a false sense of security because they think that they are covered. However, I can tell you from personal and professional experience that a backup plan/system is never fully implemented until it is regularly tested prior to the time when it actually needs to be used. Disaster Recovery Plan Expanding on Backup Policies and Procedures, a company needs to also have a disaster recovery plan in order to protect its data in case of a catastrophic disaster. Disaster recovery plans typically encompass how to restore all of a company’s data and infrastructure back to a restored operational status. Most Disaster recovery plans also include time estimates on how long each step of the disaster recovery plan should take to be executed. It is important to note that disaster recovery plans are never fully implemented until they have been tested just like backup plans. Disaster recovery plans should be tested regularly so that the business can be confident in not losing any or minimal data due to a catastrophic disaster. Firewall Policies and Content Filters One way companies can protect their data is by using a firewall to separate their internal network from the outside. Firewalls allow for enabling or disabling network access as data passes through it by applying various defined restrictions. Furthermore firewalls can also be used to prevent access from the internal network to the outside by these same factors. Common Firewall Restrictions Destination/Sender IP Address Destination/Sender Host Names Domain Names Network Ports Companies can also desire to restrict what their network user’s view on the internet through things like content filters. Content filters allow a company to track what webpages a person has accessed and can also restrict user’s access based on established rules set up in the content filter. This device and/or software can block access to domains or specific URLs based on a few factors. Common Content Filter Criteria Known malicious sites Specific Page Content Page Content Theme Anti-Virus/Mal-ware Polices Fortunately, most companies utilize antivirus programs on all computers and servers for good reason, virus have been known to do the following: Corrupt/Invalidate Data, Destroy Data, and Steal Data. Anti-Virus applications are a great way to prevent any malicious application from being able to gain access to a company’s data. However, anti-virus programs must be constantly updated because new viruses are always being created, and the anti-virus vendors need to distribute updates to their applications so that they can catch and remove them. Data Validation Policies and Procedures Data validation is very important to ensure that only accurate information is stored. The existence of invalid data can cause major problems when businesses attempt to use data for knowledge based decisions and for performance reporting. Data Scrubbing Policies and Procedures Data scrubbing is valuable to companies in one of two ways. The first can be used to clean data prior to being analyzed for report generation. The second is that it allows companies to remove things like personally Identifiable information from its data prior to transmit it between multiple environments or if the information is sent to an external location. An example of this can be seen with medical records in regards to HIPPA laws that prohibit the storage of specific personal and medical information. Additionally, I have professionally run in to a scenario where the Canadian government does not allow any Canadian’s personal information to be stored on a server not located in Canada. Encryption Practices The use of encryption is very valuable when a company needs to any personal information. This allows users with the appropriated access levels to view or confirm the existence or accuracy of data within a system by either decrypting the information or encrypting a piece of data and comparing it to the stored version. Additionally, if for some unforeseen reason the data got in to the wrong hands then they would have to first decrypt the data before they could even be able to read it. Encryption just adds and additional layer of protection around data itself. Standard Normalization Practices The use of standard data normalization practices is very important when dealing with data because it can prevent allot of potential issues by eliminating the potential for unnecessary data duplication. Issues caused by data duplication include excess use of data storage, increased chance for invalidated data, and over use of data processing. Network and Database Security/Access Policies Every company has some form of network/data access policy even if they have none. These policies help secure data from being seen by inappropriate users along with preventing the data from being updated or deleted by users. In addition, without a good security policy there is a large potential for data to be corrupted by unassuming users or even stolen. Data Storage Policies Data storage polices are very important depending on how they are implemented especially when a company is trying to utilize them in conjunction with other policies like Data Backups. I have worked at companies where all network user folders are constantly backed up, and if a user wanted to ensure the existence of a piece of data in the form of a file then they had to store that file in their network folder. Conversely, I have also worked in places where when a user logs on or off of the network there entire user profile is backed up. Training Policies One of the biggest ways to prevent data loss and ensure that data will remain a company asset is through training. The practice of properly train employees on how to work with in systems that access data is crucial when trying to ensure a company’s data will remain an asset. Users need to be trained on how to manipulate a company’s data in order to perform their tasks to reduce the chances of invalidating data.

Read the article
SSIS and StreamInsight Working Together.

I have been thinking a lot recently about what it would be like to have StreamInsight and SSIS working together. Well the CAT team have produced a paper on some of our options here. Here are some of my thoughts. There is of course a slight mismatch in their types of usage. StreamInsight is an Event Stream processing engine capable of operating on new data in the sub second timeframe. The engine allows you to do real time analytics and take decisions on events that have potentially only just happened. SSIS on the other hand is a batch processing engine. In general I do not like having to invoke the same package more than once every 90 seconds or so as it can start to get expensive. Usually when doing batch processing we have an hour or longer of grace before we have to move data from A –> B. StreamInsight operates on streams of data. Before anyone mentions it yes I know StreamInsight is equally adept at using the IEnumerable interface, but I would argue live streaming and real-time analytics is a primary goal of the product. SSIS does not have an “Always On” button I do not like the idea of embedding StreamInsight inside SSIS using a transform particularly. It means StreamInsight becomes a batch processing engine because it can only operate when the SSIS package is running and SSIS is in charge of when that happens. If I am to have StreamInsight within SSIS then I prefer to have StreamInsight on the adapters. This way you can force the adapters to stay open and introduce events into your Pipeline. SSIS has a much richer set of transforms out of the box than StreamInsight. Although “Always On” was not a design goal of SSIS I have used it like this and it works just fine. SSIS being called from within StreamInsight, now that excites me. see below For a while now I have been thinking what it would be like to decouple the Data Flow task from the SSIS package and expose it as something with which you can interact. Anything can instantiate this version of a DFT as it would expose one or more input interfaces and one or more output interfaces. I can imagine that this would be a big hit when moving to “The Cloud” as well. I could see the Data Flow task maybe being hosted in Azure Appfabric or some such layer. StreamInsight would be able to take advantage of this as well. I am interested to see where this goes and will be pressing for more meat around the subject when I visit Redmond soon.

Read the article
Importing Multiple Schemas to a Model in Oracle SQL Developer Data Modeler

- by thatjeffsmith

Your physical data model might stretch across multiple Oracle schemas. Or maybe you just want a single diagram containing tables, views, etc. spanning more than a single user in the database. The process for importing a data dictionary is the same, regardless if you want to suck in objects from one schema, or many schemas. Let’s take a quick look at how to get started with a data dictionary import. I’m using Oracle SQL Developer in this example. The process is nearly identical in Oracle SQL Developer Data Modeler – the only difference being you’ll use the ‘File’ menu to get started versus the ‘File – Data Modeler’ menu in SQL Developer. Remember, the functionality is exactly the same whether you use SQL Developer or SQL Developer Data Modeler when it comes to the data modeling features – you’ll just have a cleaner user interface in SQL Developer Data Modeler. Importing a Data Dictionary to a Model You’ll want to open or create your model first. You can import objects to an existing or new model. The easiest way to get started is to simply open the ‘Browser’ under the View menu. The Browser allows you to navigate your open designs/models You’ll see an ‘Untitled_1′ model by default. I’ve renamed mine to ‘hr_sh_scott_demo.’ Now go back to the File menu, and expand the ‘Data Modeler’ section, and select ‘Import – Data Dictionary.’ This is a fancy way of saying, ‘suck objects out of the database into my model’ Connect! If you haven’t already defined a connection to the database you want to reverse engineer, you’ll need to do that now. I’m going to assume you already have that connection – so select it, and hit the ‘Next’ button. Select the Schema(s) to be imported Select one or more schemas you want to import The schemas selected on this page of the wizard will dictate the lists of tables, views, synonyms, and everything else you can choose from in the next wizard step to import. For brevity, I have selected ALL tables, views, and synonyms from 3 different schemas: HR SCOTT SH Once I hit the ‘Finish’ button in the wizard, SQL Developer will interrogate the database and add the objects to our model. The Big Model and the 3 Little Models I can now see ALL of the objects I just imported in the ‘hr_sh_scott_demo’ relational model in my design tree, and in my relational diagram. Quick Tip: Oracle SQL Developer calls what most folks think of as a ‘Physical Model’ the ‘Relational Model.’ Same difference, mostly. In SQL Developer, a Physical model allows you to define partitioning schemes, advanced storage parameters, and add your PL/SQL code. You can have multiple physical models per relational models. For example I might have a 4 Node RAC in Production that uses partitioning, but in test/dev, only have a single instance with no partitioning. I can have models for both of those physical implementations. The list of tables in my relational model Wouldn’t it be nice if I could segregate the objects based on their schema? Good news, you can! And it’s done by default Several of you might already know where I’m going with this – SUBVIEWS. You can easily create a ‘SubView’ by selecting one or more objects in your model or diagram and add them to a new SubView. SubViews are just mini-models. They contain a subset of objects from the main model. This is very handy when you want to break your model into smaller, more digestible parts. The model information is identical across the model and subviews, so you don’t have to worry about making a change in one place and not having it propagate across your design. SubViews can be used as filters when you create reports and exports as well. So instead of generating a PDF for everything, just show me what’s in my ‘ABC’ subview. But, I don’t want to do any work! Remember, I’m really lazy. More good news – it’s already done by default! The schemas are automatically used to create default SubViews Auto-Navigate to the Object in the Diagram In the subview tree node, right-click on the object you want to navigate to. You can ask to be taken to the main model view or to the SubView location. If you haven’t already opened the SubView in the diagram, it will be automatically opened for you. The SubView diagram only contains the objects from that SubView Your SubView might still be pretty big, many dozens of objects, so don’t forget about the ‘Navigator‘ either! In summary, use the ‘Import’ feature to add existing database objects to your model. If you import from multiple schemas, take advantage of the default schema based SubViews to help you manage your models! Sometimes less is more!

Read the article
In SSIS Convert European Currency Format to United States Currency Format

- by Rob

I have an interesting problem. I have an SSIS package that processes account data. We are now processing files from Europe. These files are in a CSV format using text qualifiers. For an example of the problem: In the United States the currency format is 123456.99 (We purposely leave the thousands separator out). The files sent from Europe are coming in with two formats. One is 123456,99 and the other is 123.456,00. SSIS is attempting to parse the text file and place it into a NUMERIC(20,2) field. This causes a parsing error in SSIS even with the text qualifiers. If I change the field to CURRENCY it sends a conversion error. I would like for SSIS to deal with this directly without requiring the data to be in the United States format. Has anyone had this problem? Any help will be greatly appreciated. Rob

Read the article
MS SQL dts to ssis migration error

- by Manjot

Hi, I have migrated some DTS packages to SSIS 2005 using "Migration" wizard. When I tried to run it, it fails saying you need a higher version of SSIS even though the destination SSIS server is on 9.0.4211 level. then I digged in the package using business intelligence studio and saw that one of the package subtasks is "Transform data task" (the dts version) and the package fails to run that. The storage location for this dts task is set to "Embedded in Task". I didn't touch it. why didn't it convert this task to an SSIS data flow task? any help please? Thansk in advance

Read the article
How *not* to handle a compensation step on failure in an SSIS package

- by James Luetkehoelter

Just stumbed across this where I'm working. Someone created a global error handler for a package that included this SQL step: DELETE FROM Table WHERE DateDiff(MI, ExportedDate, GetDate()) < 5 So if the package runs for longer than 5 minutes and fails, nothing gets cleaned up. Please people, don't do this... Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!...(read more)

Read the article
SQL Server Training in the UK–SSIS, MDX, Admin, MDS, Internals

- by simonsabin

If you are looking for SQL Server training they there is no better place to start than a new company Technitrain Its been setup by a fellow MVP and SQLBits Organiser Chris Webb. Why this company rather than any others? Training based on real world experience by the best in the business. The key to Technitrain’s model is not to cram the shelves high with courses and get some average Joe trainers to deliver them. Technitrain bring in world renowned experts in their fields to deliver courses written...(read more)

Read the article
Consolidating SQL Server Error Logs from Multiple Instances Using SSIS

SQL Server hides a lot of very useful information in its error log files. Unfortunately, the process of hunting through all these logs, file-by-file, server-by-server, can cause a problem. Rodney Landrum offers a solution which will allow you to pull error log records from multiple servers into a central database, for analysis and reporting with T-SQL.

Read the article
Deploying Data Mining Models using Model Export and Import, Part 2

- by [email protected]

In my last post, Deploying Data Mining Models using Model Export and Import, we explored using DBMS_DATA_MINING.EXPORT_MODEL and DBMS_DATA_MINING.IMPORT_MODEL to enable moving a model from one system to another. In this post, we'll look at two distributed scenarios that make use of this capability and a tip for easily moving models from one machine to another using only Oracle Database, not an external file transport mechanism, such as FTP. The first scenario, consider a company with geographically distributed business units, each collecting and managing their data locally for the products they sell. Each business unit has in-house data analysts that build models to predict which products to recommend to customers in their space. A central telemarketing business unit also uses these models to score new customers locally using data collected over the phone. Since the models recommend different products, each customer is scored using each model. This is depicted in Figure 1.Figure 1: Target instance importing multiple remote models for local scoring In the second scenario, consider multiple hospitals that collect data on patients with certain types of cancer. The data collection is standardized, so each hospital collects the same patient demographic and other health / tumor data, along with the clinical diagnosis. Instead of each hospital building it's own models, the data is pooled at a central data analysis lab where a predictive model is built. Once completed, the model is distributed to hospitals, clinics, and doctor offices who can score patient data locally.Figure 2: Multiple target instances importing the same model from a source instance for local scoring Since this blog focuses on model export and import, we'll only discuss what is necessary to move a model from one database to another. Here, we use the package DBMS_FILE_TRANSFER, which can move files between Oracle databases. The script is fairly straightforward, but requires setting up a database link and directory objects. We saw how to create directory objects in the previous post. To create a database link to the source database from the target, we can use, for example: create database link SOURCE1_LINK connect to <schema> identified by <password> using 'SOURCE1'; Note that 'SOURCE1' refers to the service name of the remote database entry in your tnsnames.ora file. From SQL*Plus, first connect to the remote database and export the model. Note that the model_file_name does not include the .dmp extension. This is because export_model appends "01" to this name. Next, connect to the local database and invoke DBMS_FILE_TRANSFER.GET_FILE and import the model. Note that "01" is eliminated in the target system file name. connect <source_schema>/<password>@SOURCE1_LINK; BEGIN DBMS_DATA_MINING.EXPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp', 'MY_SOURCE_DIR_OBJECT', 'name =''MY_MINING_MODEL'''); END; connect <target_schema>/<password>; BEGIN DBMS_FILE_TRANSFER.GET_FILE ('MY_SOURCE_DIR_OBJECT', 'EXPORT_FILE_NAME' || '01.dmp', 'SOURCE1_LINK', 'MY_TARGET_DIR_OBJECT', 'EXPORT_FILE_NAME' || '.dmp' ); DBMS_DATA_MINING.IMPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp', 'MY_TARGET_DIR_OBJECT'); END; To clean up afterward, you may want to drop the exported .dmp file at the source and the transferred file at the target. For example, utl_file.fremove('&directory_name', '&model_file_name' || '.dmp');

Read the article
Abstract Data Type and Data Structure

- by mark075

It's quite difficult for me to understand these terms. I searched on google and read a little on Wikipedia but I'm still not sure. I've determined so far that: Abstract Data Type is a definition of new type, describes its properties and operations. Data Structure is an implementation of ADT. Many ADT can be implemented as the same Data Structure. If I think right, array as ADT means a collection of elements and as Data Structure, how it's stored in a memory. Stack is ADT with push, pop operations, but can we say about stack data structure if I mean I used stack implemented as an array in my algorithm? And why heap isn't ADT? It can be implemented as tree or an array.

Read the article
Error in retrieving data from Excel File

- by Sreejesh Kumar

I have an excel file. I wanted to pull the data from excel file to SQL Server table. And the data is successfully transferred.In the excel file, I removed a text from one column named "Risk" from one row.The text was lengthy one.now the package execution fails at the source ie from the excel file. The errors are shown as "[Audit [1]] Error: There was an error with output column "Risk" (100) on output "Excel Source Output" (9). The column status returned was: "DBSTATUS_UNAVAILABLE"." and "[Audit [1]] Error: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "output column "Risk" (100)" failed because error code 0xC0209071 occurred, and the error row disposition on "output column "Risk" (100)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure." the error occurs only when I remove this particular text from this row.

Read the article
More Value From Data Using Data Mining Presentation

Here is a presentation I gave at the SQLBits conference in September which was recorded by Microsoft. Usually I speak about SSIS but on this particular event I thought people would like to hear something different from me. Microsoft are making a big play for making Data Mining more accessible to everyone and not just boffins. In this presentation I give an overview of data mining and then do some demonstrations using the excellent Excel Add-Ins available from Microsoft SQL Server 2008 SQL Server 2005 I hope you enjoy this presentation http://go.microsoft.com/?linkid=9633764

Read the article
Updates to Stairway to Integration Services

- by andyleonard

The Stairway to integration Services has been updated! I added content to Step 1 to provide more detail about creating a first SSIS project and corrected a typo in Step 2 that referred to an older name for the Step 1 article. I also made the corrected Step 1 article name a link to help. Thanks to Steve Jones ( blog | @way0utwest ) for all his hard work editing and corralling trifling authors. :{>...(read more)

Read the article

Search Results

Search found 58774 results on 2351 pages for 'ssis data tranformations'.

Page 6/2351 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

- by jamiet

- by Davide Mauri

- by Pinal Dave

- by RahulS

- by DrJohn

- by BuckWoody

- by Dejan Sarka

- by [email protected]

- by jamiet

- by pinaldave

- by Mala Narasimharajan

- by thatjeffsmith

- by Rob

- by Manjot

- by James Luetkehoelter

- by simonsabin

- by [email protected]

- by mark075

- by Sreejesh Kumar

- by andyleonard

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >