Search Results

Search found 58965 results on 2359 pages for 'ssis data transformations'.

Page 6/2359 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >

  • SSIS is Case-Sensitive

    - by andyleonard
    Introduction SSIS is case-sensitive even if the database is case-insensitive. Imagine... ... you work in an ETL shop where someone who believes in natural keys won the Battle of the Joins. Imagine one of your natural keys is a string. (I know it's a stretch... play along!). Let's build some tables to sketch it out. If you do not have a TestDB database, why not? Build one! You'll use it often. Use TestDB go Create Table SSIS1 ( StrID char ( 5 ) , Name varchar ( 15 ) , Value int ) Insert Into SSIS1...(read more)

    Read the article

  • SQL Server 2008 - Script Data as Insert Statements from SSIS Package

    - by Brandon King
    SQL Server 2008 provides the ability to script data as Insert statements using the Generate Scripts option in Management Studio. Is it possible to access the same functionality from within a SSIS package? Here's what I'm trying to accomplish... I have a scheduled job that nightly scripts out all the schema and data for a SQL Server 2008 database and then uses the script to create a "mirror copy" SQLCE 3.5 database. I've been using Narayana Vyas Kondreddi's sp_generate_inserts stored procedure to accomplish this, but it has problems with some datatypes and greater-than-4K columns (holdovers from SQL Server 2000 days). The Script Data function looks like it could solve my problems, if only I could automate it. Any suggestions?

    Read the article

  • SSIS - XML Source Script

    - by simonsabin
    The XML Source in SSIS is great if you have a 1 to 1 mapping between entity and table. You can do more complex mapping but it becomes very messy and won't perform. What other options do you have? The challenge with XML processing is to not need a huge amount of memory. I remember using the early versions of Biztalk with loaded the whole document into memory to map from one document type to another. This was fine for small documents but was an absolute killer for large documents. You therefore need a streaming approach. For flexibility however you want to be able to generate your rows easily, and if you've ever used the XmlReader you will know its ugly code to write. That brings me on to LINQ. The is an implementation of LINQ over XML which is really nice. You can write nice LINQ queries instead of the XMLReader stuff. The downside is that by default LINQ to XML requires a whole XML document to work with. No streaming. Your code would look like this. We create an XDocument and then enumerate over a set of annoymous types we generate from our LINQ statement XDocument x = XDocument.Load("C:\\TEMP\\CustomerOrders-Attribute.xml");   foreach (var xdata in (from customer in x.Elements("OrderInterface").Elements("Customer")                        from order in customer.Elements("Orders").Elements("Order")                        select new { Account = customer.Attribute("AccountNumber").Value                                   , OrderDate = order.Attribute("OrderDate").Value }                        )) {     Output0Buffer.AddRow();     Output0Buffer.AccountNumber = xdata.Account;     Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate); } As I said the downside to this is that you are loading the whole document into memory. I did some googling and came across some helpful videos from a nice UK DPE Mike Taulty http://www.microsoft.com/uk/msdn/screencasts/screencast/289/LINQ-to-XML-Streaming-In-Large-Documents.aspx. Which show you how you can combine LINQ and the XmlReader to get a semi streaming approach. I took what he did and implemented it in SSIS. What I found odd was that when I ran it I got different numbers between using the streamed and non streamed versions. I found the cause was a little bug in Mikes code that causes the pointer in the XmlReader to progress past the start of the element and thus foreach (var xdata in (from customer in StreamReader("C:\\TEMP\\CustomerOrders-Attribute.xml","Customer")                                from order in customer.Elements("Orders").Elements("Order")                                select new { Account = customer.Attribute("AccountNumber").Value                                           , OrderDate = order.Attribute("OrderDate").Value }                                ))         {             Output0Buffer.AddRow();             Output0Buffer.AccountNumber = xdata.Account;             Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate);         } These look very similiar and they are the key element is the method we are calling, StreamReader. This method is what gives us streaming, what it does is return a enumerable list of elements, because of the way that LINQ works this results in the data being streamed in. static IEnumerable<XElement> StreamReader(String filename, string elementName) {     using (XmlReader xr = XmlReader.Create(filename))     {         xr.MoveToContent();         while (xr.Read()) //Reads the first element         {             while (xr.NodeType == XmlNodeType.Element && xr.Name == elementName)             {                 XElement node = (XElement)XElement.ReadFrom(xr);                   yield return node;             }         }         xr.Close();     } } This code is specifically designed to return a list of the elements with a specific name. The first Read reads the root element and then the inner while loop checks to see if the current element is the type we want. If not we do the xr.Read() again until we find the element type we want. We then use the neat function XElement.ReadFrom to read an element and all its sub elements into an XElement. This is what is returned and can be consumed by the LINQ statement. Essentially once one element has been read we need to check if we are still on the same element type and name (the inner loop) This was Mikes mistake, if we called .Read again we would advance the XmlReader beyond the start of the Element and so the ReadFrom method wouldn't work. So with the code above you can use what ever LINQ statement you like to flatten your XML into the rowsets you want. You could even have multiple outputs and generate your own surrogate keys.        

    Read the article

  • SSIS - Range lookups

    - by Repieter
      When developing an ETL solution in SSIS we sometimes need to do range lookups in SSIS. Several solutions for this can be found on the internet, but now we have built another solution which I would like to share, since it's pretty easy to implement and the performance is fast.   You can download the sample package to see how it works. Make sure you have the AdventureWorks2008R2 and AdventureWorksDW2008R2 databases installed. (Apologies for the layout of this blog, I don't do this too often :))   To give a little bit more information about the example, this is basically what is does: we load a facttable and do an SCD type 2 lookup operation of the Product dimension. This is done with a script component.   First we query the Data warehouse to create the lookup dataset. The query that is used for that is:   SELECT     [ProductKey]     ,[ProductAlternateKey]     ,[StartDate]     ,ISNULL([EndDate], '9999-01-01') AS EndDate FROM [DimProduct]     The output of this query is stored in a DataTable:     string lookupQuery = @"                         SELECT                             [ProductKey]                             ,[ProductAlternateKey]                             ,[StartDate]                             ,ISNULL([EndDate], '9999-01-01') AS EndDate                         FROM [DimProduct]";           OleDbCommand oleDbCommand = new OleDbCommand(lookupQuery, _oleDbConnection);         OleDbDataAdapter adapter = new OleDbDataAdapter(oleDbCommand);           _dataTable = new DataTable();         adapter.Fill(_dataTable);     Now that the dimension data is stored in the DataTable we use the following method to do the actual lookup:   public int RangeLookup(string businessKey, DateTime lookupDate)     {         // set default return value (Unknown)         int result = -1;           DataRow[] filteredRows;         filteredRows = _dataTable.Select(string.Format("ProductAlternateKey = '{0}'", businessKey));           for (int i = 0; i < filteredRows.Length; i++)         {             // check if the lookupdate is found between the startdate and enddate of any of the records             if (lookupDate >= (DateTime)filteredRows[i][2] && lookupDate < (DateTime)filteredRows[i][3])             {                 result = (filteredRows[i][0] == null) ? -1 : (int)filteredRows[i][0];                 break;             }         }           filteredRows = null;           return result;     }       This method is executed for every row that passes the script component. This is implemented in the ProcessInputRow method   public override void Input0_ProcessInputRow(Input0Buffer Row)     {         // Perform the lookup operation on the current row and put the value in the Surrogate Key Attribute         Row.ProductKey = RangeLookup(Row.ProductNumber, Row.OrderDate);     }   Now what actually happens?!   1. Every record passes the business key and the orderdate to the RangeLookup method. 2. The DataTable is then filtered on the business key of the current record. The output is stored in a DataRow [] object. 3. We loop over the DataRow[] object to see where the orderdate meets the following expression: (lookupDate >= (DateTime)filteredRows[i][2] && lookupDate < (DateTime)filteredRows[i][3]) 4. When the expression returns true (so where the data is between the Startdate and the EndDate), the surrogate key of the dimension record is returned   We have done some testing with this solution and it works great for us. Hope others can use this example to do their range lookups.

    Read the article

  • What is the definition of "Big Data"?

    - by Ben
    Is there one? All the definitions I can find describe the size, complexity / variety or velocity of the data. Wikipedia's definition is the only one I've found with an actual number Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. However, this seemingly contradicts the MIKE2.0 definition, referenced in the next paragraph, which indicates that "big" data can be small and that 100,000 sensors on an aircraft creating only 3GB of data could be considered big. IBM despite saying that: Big data is more simply than a matter of size. have emphasised size in their definition. O'Reilly has stressed "volume, velocity and variety" as well. Though explained well, and in more depth, the definition seems to be a re-hash of the others - or vice-versa of course. I think that a Computer Weekly article title sums up a number of articles fairly well "What is big data and how can it be used to gain competitive advantage". But ZDNet wins with the following from 2012: “Big Data” is a catch phrase that has been bubbling up from the high performance computing niche of the IT market... If one sits through the presentations from ten suppliers of technology, fifteen or so different definitions are likely to come forward. Each definition, of course, tends to support the need for that supplier’s products and services. Imagine that. Basically "big data" is "big" in some way shape or form. What is "big"? Is it quantifiable at the current time? If "big" is unquantifiable is there a definition that does not rely solely on generalities?

    Read the article

  • Why I don't use SSIS checkpoint files

    - by jamiet
    In a recent discussion in regard to general ETL best practises the subject of checkpoint files as a means for package restartability came up and I stated that I was dead against using them. For anyone that may care, here is why: Configuring them is distinctly unintuitive (that's a matter of opinion but if you follow the link I'll wager that you will agree) they don't make any allowance for loop iterations they cannot store variables of type Object they are limited in ability. There are many scenarios where you may want to execute certain containers regardless of whether the package is started from a checkpoint file but the current usage model does not allow for this. they are ignored by eventhandlers, which wouldn't be so bad if there were a way to toggle this behaviour in certain scenarios they dont work properly I'll expand on the last bullet point. I have encountered situations where the behaviour for tasks executing concurrently is unpredictable. That is, sometimes the completion of a task that executes concurrently with a failed/failing task will make it into the checkpoint file and sometimes it won't. This is near-impossible to reproduce but it does happen as my good friend John Welch will hopefully concur (if he is reading). Is anyone out there making successful use of checkpoint files within SSIS? I would be interested in knowing about that if so. @Jamiet

    Read the article

  • Big Data – Operational Databases Supporting Big Data – Columnar, Graph and Spatial Database – Day 14 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Key-Value Pair Databases and Document Databases in the Big Data Story. In this article we will understand the role of Columnar, Graph and Spatial Database supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (The day before yesterday’s post) NoSQL Databases (The day before yesterday’s post) Key-Value Pair Databases (Yesterday’s post) Document Databases (Yesterday’s post) Columnar Databases (Tomorrow’s post) Graph Databases (Today’s post) Spatial Databases (Today’s post) Columnar Databases  Relational Database is a row store database or a row oriented database. Columnar databases are column oriented or column store databases. As we discussed earlier in Big Data we have different kinds of data and we need to store different kinds of data in the database. When we have columnar database it is very easy to do so as we can just add a new column to the columnar database. HBase is one of the most popular columnar databases. It uses Hadoop file system and MapReduce for its core data storage. However, remember this is not a good solution for every application. This is particularly good for the database where there is high volume incremental data is gathered and processed. Graph Databases For a highly interconnected data it is suitable to use Graph Database. This database has node relationship structure. Nodes and relationships contain a Key Value Pair where data is stored. The major advantage of this database is that it supports faster navigation among various relationships. For example, Facebook uses a graph database to list and demonstrate various relationships between users. Neo4J is one of the most popular open source graph database. One of the major dis-advantage of the Graph Database is that it is not possible to self-reference (self joins in the RDBMS terms) and there might be real world scenarios where this might be required and graph database does not support it. Spatial Databases  We all use Foursquare, Google+ as well Facebook Check-ins for location aware check-ins. All the location aware applications figure out the position of the phone with the help of Global Positioning System (GPS). Think about it, so many different users at different location in the world and checking-in all together. Additionally, the applications now feature reach and users are demanding more and more information from them, for example like movies, coffee shop or places see. They are all running with the help of Spatial Databases. Spatial data are standardize by the Open Geospatial Consortium known as OGC. Spatial data helps answering many interesting questions like “Distance between two locations, area of interesting places etc.” When we think of it, it is very clear that handing spatial data and returning meaningful result is one big task when there are millions of users moving dynamically from one place to another place & requesting various spatial information. PostGIS/OpenGIS suite is very popular spatial database. It runs as a layer implementation on the RDBMS PostgreSQL. This makes it totally unique as it offers best from both the worlds. Courtesy: mushroom network Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Hive. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Execute a SSIS package in Sync or Async mode from SQL Server 2012

    - by Davide Mauri
    Today I had to schedule a package stored in the shiny new SSIS Catalog store that can be enabled with SQL Server 2012. (http://msdn.microsoft.com/en-us/library/hh479588(v=SQL.110).aspx) Once your packages are stored here, they will be executed using the new stored procedures created for this purpose. This is the script that will get executed if you try to execute your packages right from management studio or through a SQL Server Agent job, will be similar to the following: Declare @execution_id bigint EXEC [SSISDB].[catalog].[create_execution] @package_name='my_package.dtsx', @execution_id=@execution_id OUTPUT, @folder_name=N'BI', @project_name=N'DWH', @use32bitruntime=False, @reference_id=Null Select @execution_id DECLARE @var0 smallint = 1 EXEC [SSISDB].[catalog].[set_execution_parameter_value] @execution_id,  @object_type=50, @parameter_name=N'LOGGING_LEVEL', @parameter_value=@var0 DECLARE @var1 bit = 0 EXEC [SSISDB].[catalog].[set_execution_parameter_value] @execution_id,  @object_type=50, @parameter_name=N'DUMP_ON_ERROR', @parameter_value=@var1 EXEC [SSISDB].[catalog].[start_execution] @execution_id GO The problem here is that the procedure will simply start the execution of the package and will return as soon as the package as been started…thus giving you the opportunity to execute packages asynchrously from your T-SQL code. This is just *great*, but what happens if I what to execute a package and WAIT for it to finish (and thus having a synchronous execution of it)? You have to be sure that you add the “SYNCHRONIZED” parameter to the package execution. Before the start_execution procedure: exec [SSISDB].[catalog].[set_execution_parameter_value] @execution_id,  @object_type=50, @parameter_name=N'SYNCHRONIZED', @parameter_value=1 And that’s it . PS From the RC0, the SYNCHRONIZED parameter is automatically added each time you schedule a package execution through the SQL Server Agent. If you’re using an external scheduler, just keep this post in mind .

    Read the article

  • Can't save data for a member in a data form

    - by RahulS
    Implied sharing is an old thing everyone knows the reasons and solutions of that, still little theory about that: With Essbase implied sharing, some members are shared even if you do not explicitly set them as shared. These members are implied shared members. When an implied share relationship is created, each implied member assumes the other member’s value. Essbase assumes (or implies) a shared member relationship in these situations: 1. A parent has only one child 2. A parent has only one child that consolidates to the parent In a Planning form that contains members with an implied sharing relationship, when a value is added for the parent, the child assumes the same value after the form is saved. Likewise, if a value is added for the child, the parent usually assumes the same value after a form is saved.For example, when a calculation script or load rule populates an implied share member, the other implied share member assumes the value of the member populated by the calculation script or load rule. The last value calculated or imported takes precedence. The result is the same whether you refer to the parent or the child as a variable in a calculation script. For more information have a look at: http://docs.oracle.com/cd/E17236_01/epm.1112/hp_admin_11122/ch14s11.html Now the issue which we are going to talk about is We loose data on save even when the parent is dynamic calc and has a single child. A dynamic calc parent to a single child:  If we design the form with following selection: In the data form we will find parent below the member and this is by design whenever you make a selection using commands to select all the member below parent, always children will appear before the parent: Lets try to enter data, Save it Now, try to change the way we selected members Here we go: Now the question again why this behavior: 1. Data from Planning data form passes to Essbase row by row, 2. Because in data form the child member appears before the parent, 3. First, data goes to Essbase for child (SingleStoreChild), 4. Then when Planning passes the data for parent there was #Missing or No data,  5. Over writes the data to #missing. PS: As we know that dynamic calc members are calculated on the fly they are not allocated with any memory in the Essbase, here the parent was dynamic calc and it was pointing to same memory as child in the background, when Planning was passing data to Essbase for second row it has updated the child with missing data.(Little confusing, let me know if you need more explanation) 6. As one of the solutions just change the order of appearance of parent and child. Cheers..!!! Rahul S. https://www.facebook.com/pages/HyperionPlanning/117320818374228

    Read the article

  • Building dynamic OLAP data marts on-the-fly

    - by DrJohn
    At the forthcoming SQLBits conference, I will be presenting a session on how to dynamically build an OLAP data mart on-the-fly. This blog entry is intended to clarify exactly what I mean by an OLAP data mart, why you may need to build them on-the-fly and finally outline the steps needed to build them dynamically. In subsequent blog entries, I will present exactly how to implement some of the techniques involved. What is an OLAP data mart? In data warehousing parlance, a data mart is a subset of the overall corporate data provided to business users to meet specific business needs. Of course, the term does not specify the technology involved, so I coined the term "OLAP data mart" to identify a subset of data which is delivered in the form of an OLAP cube which may be accompanied by the relational database upon which it was built. To clarify, the relational database is specifically create and loaded with the subset of data and then the OLAP cube is built and processed to make the data available to the end-users via standard OLAP client tools. Why build OLAP data marts? Market research companies sell data to their clients to make money. To gain competitive advantage, market research providers like to "add value" to their data by providing systems that enhance analytics, thereby allowing clients to make best use of the data. As such, OLAP cubes have become a standard way of delivering added value to clients. They can be built on-the-fly to hold specific data sets and meet particular needs and then hosted on a secure intranet site for remote access, or shipped to clients' own infrastructure for hosting. Even better, they support a wide range of different tools for analytical purposes, including the ever popular Microsoft Excel. Extension Attributes: The Challenge One of the key challenges in building multiple OLAP data marts based on the same 'template' is handling extension attributes. These are attributes that meet the client's specific reporting needs, but do not form part of the standard template. Now clearly, these extension attributes have to come into the system via additional files and ultimately be added to relational tables so they can end up in the OLAP cube. However, processing these files and filling dynamically altered tables with SSIS is a challenge as SSIS packages tend to break as soon as the database schema changes. There are two approaches to this: (1) dynamically build an SSIS package in memory to match the new database schema using C#, or (2) have the extension attributes provided as name/value pairs so the file's schema does not change and can easily be loaded using SSIS. The problem with the first approach is the complexity of writing an awful lot of complex C# code. The problem of the second approach is that name/value pairs are useless to an OLAP cube; so they have to be pivoted back into a proper relational table somewhere in the data load process WITHOUT breaking SSIS. How this can be done will be part of future blog entry. What is involved in building an OLAP data mart? There are a great many steps involved in building OLAP data marts on-the-fly. The key point is that all the steps must be automated to allow for the production of multiple OLAP data marts per day (i.e. many thousands, each with its own specific data set and attributes). Now most of these steps have a great deal in common with standard data warehouse practices. The key difference is that the databases are all built to order. The only permanent database is the metadata database (shown in orange) which holds all the metadata needed to build everything else (i.e. client orders, configuration information, connection strings, client specific requirements and attributes etc.). The staging database (shown in red) has a short life: it is built, populated and then ripped down as soon as the OLAP Data Mart has been populated. In the diagram below, the OLAP data mart comprises the two blue components: the Data Mart which is a relational database and the OLAP Cube which is an OLAP database implemented using Microsoft Analysis Services (SSAS). The client may receive just the OLAP cube or both components together depending on their reporting requirements.  So, in broad terms the steps required to fulfil a client order are as follows: Step 1: Prepare metadata Create a set of database names unique to the client's order Modify all package connection strings to be used by SSIS to point to new databases and file locations. Step 2: Create relational databases Create the staging and data mart relational databases using dynamic SQL and set the database recovery mode to SIMPLE as we do not need the overhead of logging anything Execute SQL scripts to build all database objects (tables, views, functions and stored procedures) in the two databases Step 3: Load staging database Use SSIS to load all data files into the staging database in a parallel operation Load extension files containing name/value pairs. These will provide client-specific attributes in the OLAP cube. Step 4: Load data mart relational database Load the data from staging into the data mart relational database, again in parallel where possible Allocate surrogate keys and use SSIS to perform surrogate key lookup during the load of fact tables Step 5: Load extension tables & attributes Pivot the extension attributes from their native name/value pairs into proper relational tables Add the extension attributes to the views used by OLAP cube Step 6: Deploy & Process OLAP cube Deploy the OLAP database directly to the server using a C# script task in SSIS Modify the connection string used by the OLAP cube to point to the data mart relational database Modify the cube structure to add the extension attributes to both the data source view and the relevant dimensions Remove any standard attributes that not required Process the OLAP cube Step 7: Backup and drop databases Drop staging database as it is no longer required Backup data mart relational and OLAP database and ship these to the client's infrastructure Drop data mart relational and OLAP database from the build server Mark order complete Start processing the next order, ad infinitum. So my future blog posts and my forthcoming session at the SQLBits conference will all focus on some of the more interesting aspects of building OLAP data marts on-the-fly such as handling the load of extension attributes and how to dynamically alter the structure of an OLAP cube using C#.

    Read the article

  • Is Data Science “Science”?

    - by BuckWoody
    I hold the term “science” in very high esteem. I grew up on the Space Coast in Florida, and eventually worked at the Kennedy Space Center, surrounded by very intelligent people who worked in various scientific fields. Recently a new term has entered the computing dialog – “Data Scientist”. Since it’s not a standard term, it has a lot of definitions, and in fact has been disputed as a correct term. After all, the reasoning goes, if there’s no such thing as “Data Science” then how can there be a Data Scientist? This argument has been made before, albeit with a different term – “Computer Science”. In Peter Denning’s excellent article “Is Computer Science Science” (April  2005/Vol. 48, No. 4 COMMUNICATIONS OF THE ACM) there are many points that separate “science” from “engineering” and even “art”.  I won’t repeat the content of that article here (I recommend you read it on your own) but will leverage the points he makes there. Definition of Science To ask the question “is data science ‘science’” then we need to start with a definition of terms. Various references put the definition into the same basic areas: Study of the physical world Systematic and/or disciplined study of a subject area ...and then they include the things studied, the bodies of knowledge and so on. The word itself comes from Latin, and means merely “to know” or “to study to know”. Greek divides knowledge further into “truth” (episteme), and practical use or effects (tekhne). Normally computing falls into the second realm. Definition of Data Science And now a more controversial definition: Data Science. This term is so new and perhaps so niche that the major dictionaries haven’t yet picked it up (my OED reference is older – can’t afford to pop for the online registration at present). Researching the term's general use I created an amalgam of the definitions this way: “Studying and applying mathematical and other techniques to derive information from complex data sets.” Using this definition, data science certainly seems to be science - it's learning about and studying some object or area using systematic methods. But implicit within the definition is the word “application”, which makes the process more akin to engineering or even technology than science. In fact, I find that using these techniques – and data itself – part of science, not science itself. I leave out the concept of studying data patterns or algorithms as part of this discipline. That is actually a domain I see within research, mathematics or computer science. That of course is a type of science, but does not seek for practical applications. As part of the argument against calling it “Data Science”, some point to the scientific method of creating a hypothesis, testing with controls, testing results against the hypothesis, and documenting for repeatability.  These are not steps that we often take in working with data. We normally start with a question, and fit patterns and algorithms to predict outcomes and find correlations. In this way Data Science is more akin to statistics (and in fact makes heavy use of them) in the process rather than starting with an assumption and following on with it. So, is Data Science “Science”? I’m uncertain – and I’m uncertain it matters. Even if we are facing rampant “title inflation” these days (does anyone introduce themselves as a secretary or supervisor anymore?) I can tolerate the term at least from the intent that we use data to study problems across a wide spectrum, rather than restricting it to a single domain. And I also understand those who have worked hard to achieve the very honorable title of “scientist” who have issues with those who borrow the term without asking. What do you think? Science, or not? Does it matter?

    Read the article

  • Data Modeling Resources

    - by Dejan Sarka
    You can find many different data modeling resources. It is impossible to list all of them. I selected only the most valuable ones for me, and, of course, the ones I contributed to. Books Chris J. Date: An Introduction to Database Systems – IMO a “must” to understand the relational model correctly. Terry Halpin, Tony Morgan: Information Modeling and Relational Databases – meet the object-role modeling leaders. Chris J. Date, Nikos Lorentzos and Hugh Darwen: Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL – all theory needed to manage temporal data. Louis Davidson, Jessica M. Moss: Pro SQL Server 2012 Relational Database Design and Implementation – the best SQL Server focused data modeling book I know by two of my friends. Dejan Sarka, et al.: MCITP Self-Paced Training Kit (Exam 70-441): Designing Database Solutions by Using Microsoft® SQL Server™ 2005 – SQL Server 2005 data modeling training kit. Most of the text is still valid for SQL Server 2008, 2008 R2, 2012 and 2014. Itzik Ben-Gan, Lubor Kollar, Dejan Sarka, Steve Kass: Inside Microsoft SQL Server 2008 T-SQL Querying – Steve wrote a chapter with mathematical background, and I added a chapter with theoretical introduction to the relational model. Itzik Ben-Gan, Dejan Sarka, Roger Wolter, Greg Low, Ed Katibah, Isaac Kunen: Inside Microsoft SQL Server 2008 T-SQL Programming – I added three chapters with theoretical introduction and practical solutions for the user-defined data types, dynamic schema and temporal data. Dejan Sarka, Matija Lah, Grega Jerkic: Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft SQL Server 2012 – my first two chapters are about data warehouse design and implementation. Courses Data Modeling Essentials – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. Logical and Physical Modeling for Analytical Applications – online course I wrote for Pluralsight. Working with Temporal data in SQL Server – my latest Pluralsight course, where besides theory and implementation I introduce many original ways how to optimize temporal queries. Forthcoming presentations SQL Bits 12, July 17th – 19th, Telford, UK – I have a full-day pre-conference seminar Advanced Data Modeling Topics there.

    Read the article

  • Using Find/Replace with regular expressions inside a SSIS package

    - by jamiet
    Another one of those might-be-useful-again-one-day-so-I’ll-share-it-in-a-blog-post blog posts I am currently working on a SQL Server Integration Services (SSIS) 2012 implementation where each package contains a parameter called ETLIfcHist_ID: During normal execution this will get altered when the package is executed from the Execute Package Task however we want to make sure that at deployment-time they all have a default value of –1. Of course, they tend to get changed during development so I wanted a way of easily changing them all back to the default value. Opening up each package in turn and editing them was an option but given that we have over 40 packages and we might want to carry out this reset fairly frequently I needed a more automated method so I turned to Visual Studio’s Find/Replace… feature Of course, we don’t know what value will be in that parameter so I can’t simply search for a particular value; hence I opted to use a regular expression to identify the value to be change. In the rest of this blog post I’ll explain how to do that. For demonstration purposes I have taken the contents of a .dtsx file and stripped out everything except the element containing the parameters (<DTS:PackageParameters>), if you want to play along at home you can copy-paste the XML document below into a new XML file and open it up in Visual Studio: <?xml version="1.0"?> <DTS:Executable xmlns:DTS="www.microsoft.com/SqlServer/Dts">   <DTS:PackageParameters>     <DTS:PackageParameter       DTS:CreationName=""       DTS:DataType="3"       DTS:Description="InterfaceHistory_ID: used for Lineage"       DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25846C7E}"       DTS:ObjectName="ETLIfcHist_ID">       <DTS:Property         DTS:DataType="3"         DTS:Name="ParameterValue">VALUE_TO_BE_CHANGED</DTS:Property>     </DTS:PackageParameter>     <DTS:PackageParameter       DTS:CreationName=""       DTS:DataType="3"       DTS:Description="Some other description"       DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25845C7E}"       DTS:ObjectName="SomeOtherObjectName">       <DTS:Property         DTS:DataType="3"         DTS:Name="ParameterValue">SomeOtherValue</DTS:Property>     </DTS:PackageParameter>   </DTS:PackageParameters> </DTS:Executable> We are trying to identify the value of the parameter whose name is ETLIfcHist_ID – notice that in the XML document above that value is VALUE_TO_BE_CHANGED. The following regular expression will find the appropriate portion of the XML document: {\<DTS\:PackageParameter[\n ]*DTS\:CreationName="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:DataType="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:Description="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:DTSID="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:ObjectName="ETLIfcHist_ID"\>[\n ]*\<DTS\:Property[\n ]*DTS\:DataType="[A-Za-z0-9\:_\{\}- ]*"[\n ]*DTS\:Name="ParameterValue"\>}[A-Za-z0-9\:_\{\}- ]*{\<\/DTS\:Property\>} I have highlighted the name of the parameter that we’re looking for. I have also highlighted two portions identified by pairs of curly braces “{…}”; these are important because they pick out the two portions either side of the value I want to replace, in other words the portions highlighted here: <DTS:PackageParameters>     <DTS:PackageParameter       DTS:CreationName=""       DTS:DataType="3"       DTS:Description="InterfaceHistory_ID: used for Lineage"       DTS:DTSID="{635616DB-EEEE-45C8-89AA-713E25846C7E}"       DTS:ObjectName="ETLIfcHist_ID">       <DTS:Property         DTS:DataType="3"         DTS:Name="ParameterValue">VALUE_TO_BE_CHANGED</DTS:Property>     </DTS:PackageParameter> Those sections in the curly braces are termed tag expressions and can be identified in the replace expression using a backslash and a number identifying which tag expression you’re referring to according to its ordinal position. Hence, our replace expression is simply: \1-1\2 We’re saying the portion of our file identified by the regular expression should be replaced by the first curly brace section, then the literal –1, then the second curly brace section. Make sense? Give it a go yourself by plugging those two expressions into Visual Studio’s Find and Replace dialog. If you set it to look in “All Open Documents” then you can open up the code-behind of all your packages and change all of them at once. The Find and Replace dialog will look like this: That’s it! I realise that not everyone will be looking to change the value of a parameter but hopefully I have shown you a technique that you can modify to work for your own scenario. Given that this blog post is, y’know, on the web I have no doubt that someone is going to find a fault with my find regex expression and if that person is you….that’s OK. Let me know about it in the comments below and perhaps we can work together to come up with something better! Note that some parameters may have a different set of properties (for example some, but not all, of my parameters have a DTS:Required attribute) so your find regular expression may have to change accordingly. When researching this I found the following article to be invaluable: Visual Studio Find/Replace Regular Expression Usage @Jamiet

    Read the article

  • Why Oracle Data Integrator for Big Data?

    - by Mala Narasimharajan
    Big Data is everywhere these days - but what exactly is it? It’s data that comes from a multitude of sources – not only structured data, but unstructured data as well.  The sheer volume of data is mindboggling – here are a few examples of big data: climate information collected from sensors, social media information, digital pictures, log files, online video files, medical records or online transaction records.  These are just a few examples of what constitutes big data.   Embedded in big data is tremendous value and being able to manipulate, load, transform and analyze big data is key to enhancing productivity and competitiveness.  The value of big data lies in its propensity for greater in-depth analysis and data segmentation -- in turn giving companies detailed information on product performance, customer preferences and inventory.  Furthermore, by being able to store and create more data in digital form, “big data can unlock significant value by making information transparent and usable at much higher frequency." (McKinsey Global Institute, May 2011) Oracle's flagship product for bulk data movement and transformation, Oracle Data Integrator, is a critical component of Oracle’s Big Data strategy. ODI provides automation, bulk loading, and validation and transformation capabilities for Big Data while minimizing the complexities of using Hadoop.  Specifically, the advantages of ODI in a Big Data scenario are due to pre-built Knowledge Modules that drive processing in Hadoop. This leverages the graphical UI to load and unload data from Hadoop, perform data validations and create mapping expressions for transformations.  The Knowledge Modules provide a key jump-start and eliminate a significant amount of Hadoop development.  Using Oracle Data Integrator together with Oracle Big Data Connectors, you can simplify the complexities of mapping, accessing, and loading big data (via NoSQL or HDFS) but also correlating your enterprise data – this correlation may require integrating across heterogeneous and standards-based environments, connecting to Oracle Exadata, or sourcing via a big data platform such as Oracle Big Data Appliance. To learn more about Oracle Data Integration and Big Data, download our resource kit to see the latest in whitepapers, webinars, downloads, and more… or go to our website on www.oracle.com/bigdata

    Read the article

  • General Policies and Procedures for Maintaining the Value of Data Assets

    Here is a general list for policies and procedures regarding maintaining the value of data assets. Data Backup Policies and Procedures Backups are very important when dealing with data because there is always the chance of losing data due to faulty hardware or a user activity. So the need for a strategic backup system should be mandatory for all companies. This being said, in the real world some companies that I have worked for do not really have a good data backup plan. Typically when companies tend to take this kind of approach in data backups usually the data is not really recoverable.  Unfortunately when companies do not regularly test their backup plans they get a false sense of security because they think that they are covered. However, I can tell you from personal and professional experience that a backup plan/system is never fully implemented until it is regularly tested prior to the time when it actually needs to be used. Disaster Recovery Plan Expanding on Backup Policies and Procedures, a company needs to also have a disaster recovery plan in order to protect its data in case of a catastrophic disaster.  Disaster recovery plans typically encompass how to restore all of a company’s data and infrastructure back to a restored operational status.  Most Disaster recovery plans also include time estimates on how long each step of the disaster recovery plan should take to be executed.  It is important to note that disaster recovery plans are never fully implemented until they have been tested just like backup plans. Disaster recovery plans should be tested regularly so that the business can be confident in not losing any or minimal data due to a catastrophic disaster. Firewall Policies and Content Filters One way companies can protect their data is by using a firewall to separate their internal network from the outside. Firewalls allow for enabling or disabling network access as data passes through it by applying various defined restrictions. Furthermore firewalls can also be used to prevent access from the internal network to the outside by these same factors. Common Firewall Restrictions Destination/Sender IP Address Destination/Sender Host Names Domain Names Network Ports Companies can also desire to restrict what their network user’s view on the internet through things like content filters. Content filters allow a company to track what webpages a person has accessed and can also restrict user’s access based on established rules set up in the content filter. This device and/or software can block access to domains or specific URLs based on a few factors. Common Content Filter Criteria Known malicious sites Specific Page Content Page Content Theme  Anti-Virus/Mal-ware Polices Fortunately, most companies utilize antivirus programs on all computers and servers for good reason, virus have been known to do the following: Corrupt/Invalidate Data, Destroy Data, and Steal Data. Anti-Virus applications are a great way to prevent any malicious application from being able to gain access to a company’s data.  However, anti-virus programs must be constantly updated because new viruses are always being created, and the anti-virus vendors need to distribute updates to their applications so that they can catch and remove them. Data Validation Policies and Procedures Data validation is very important to ensure that only accurate information is stored. The existence of invalid data can cause major problems when businesses attempt to use data for knowledge based decisions and for performance reporting. Data Scrubbing Policies and Procedures Data scrubbing is valuable to companies in one of two ways. The first can be used to clean data prior to being analyzed for report generation. The second is that it allows companies to remove things like personally Identifiable information from its data prior to transmit it between multiple environments or if the information is sent to an external location. An example of this can be seen with medical records in regards to HIPPA laws that prohibit the storage of specific personal and medical information. Additionally, I have professionally run in to a scenario where the Canadian government does not allow any Canadian’s personal information to be stored on a server not located in Canada. Encryption Practices The use of encryption is very valuable when a company needs to any personal information. This allows users with the appropriated access levels to view or confirm the existence or accuracy of data within a system by either decrypting the information or encrypting a piece of data and comparing it to the stored version.  Additionally, if for some unforeseen reason the data got in to the wrong hands then they would have to first decrypt the data before they could even be able to read it. Encryption just adds and additional layer of protection around data itself. Standard Normalization Practices The use of standard data normalization practices is very important when dealing with data because it can prevent allot of potential issues by eliminating the potential for unnecessary data duplication. Issues caused by data duplication include excess use of data storage, increased chance for invalidated data, and over use of data processing. Network and Database Security/Access Policies Every company has some form of network/data access policy even if they have none. These policies help secure data from being seen by inappropriate users along with preventing the data from being updated or deleted by users. In addition, without a good security policy there is a large potential for data to be corrupted by unassuming users or even stolen. Data Storage Policies Data storage polices are very important depending on how they are implemented especially when a company is trying to utilize them in conjunction with other policies like Data Backups. I have worked at companies where all network user folders are constantly backed up, and if a user wanted to ensure the existence of a piece of data in the form of a file then they had to store that file in their network folder. Conversely, I have also worked in places where when a user logs on or off of the network there entire user profile is backed up. Training Policies One of the biggest ways to prevent data loss and ensure that data will remain a company asset is through training. The practice of properly train employees on how to work with in systems that access data is crucial when trying to ensure a company’s data will remain an asset. Users need to be trained on how to manipulate a company’s data in order to perform their tasks to reduce the chances of invalidating data.

    Read the article

  • Inability to detect the Output from inside a SSIS script component

    - by Danaja
    In the script of the script components the Output buffer is not being detected as an existing component. I am trying to use the following piece of code Output0Buffer.AddRow(); within the public override void Input0_ProcessInputRow(Input0Buffer Row) method. I know it should be available within this method because at the moment I am copying and using a component from a previous project that has this code and it works. but when I create a new component and put the same code in it doesn't Can any one explain why this is happening?

    Read the article

  • Importing Multiple Schemas to a Model in Oracle SQL Developer Data Modeler

    - by thatjeffsmith
    Your physical data model might stretch across multiple Oracle schemas. Or maybe you just want a single diagram containing tables, views, etc. spanning more than a single user in the database. The process for importing a data dictionary is the same, regardless if you want to suck in objects from one schema, or many schemas. Let’s take a quick look at how to get started with a data dictionary import. I’m using Oracle SQL Developer in this example. The process is nearly identical in Oracle SQL Developer Data Modeler – the only difference being you’ll use the ‘File’ menu to get started versus the ‘File – Data Modeler’ menu in SQL Developer. Remember, the functionality is exactly the same whether you use SQL Developer or SQL Developer Data Modeler when it comes to the data modeling features – you’ll just have a cleaner user interface in SQL Developer Data Modeler. Importing a Data Dictionary to a Model You’ll want to open or create your model first. You can import objects to an existing or new model. The easiest way to get started is to simply open the ‘Browser’ under the View menu. The Browser allows you to navigate your open designs/models You’ll see an ‘Untitled_1′ model by default. I’ve renamed mine to ‘hr_sh_scott_demo.’ Now go back to the File menu, and expand the ‘Data Modeler’ section, and select ‘Import – Data Dictionary.’ This is a fancy way of saying, ‘suck objects out of the database into my model’ Connect! If you haven’t already defined a connection to the database you want to reverse engineer, you’ll need to do that now. I’m going to assume you already have that connection – so select it, and hit the ‘Next’ button. Select the Schema(s) to be imported Select one or more schemas you want to import The schemas selected on this page of the wizard will dictate the lists of tables, views, synonyms, and everything else you can choose from in the next wizard step to import. For brevity, I have selected ALL tables, views, and synonyms from 3 different schemas: HR SCOTT SH Once I hit the ‘Finish’ button in the wizard, SQL Developer will interrogate the database and add the objects to our model. The Big Model and the 3 Little Models I can now see ALL of the objects I just imported in the ‘hr_sh_scott_demo’ relational model in my design tree, and in my relational diagram. Quick Tip: Oracle SQL Developer calls what most folks think of as a ‘Physical Model’ the ‘Relational Model.’ Same difference, mostly. In SQL Developer, a Physical model allows you to define partitioning schemes, advanced storage parameters, and add your PL/SQL code. You can have multiple physical models per relational models. For example I might have a 4 Node RAC in Production that uses partitioning, but in test/dev, only have a single instance with no partitioning. I can have models for both of those physical implementations. The list of tables in my relational model Wouldn’t it be nice if I could segregate the objects based on their schema? Good news, you can! And it’s done by default Several of you might already know where I’m going with this – SUBVIEWS. You can easily create a ‘SubView’ by selecting one or more objects in your model or diagram and add them to a new SubView. SubViews are just mini-models. They contain a subset of objects from the main model. This is very handy when you want to break your model into smaller, more digestible parts. The model information is identical across the model and subviews, so you don’t have to worry about making a change in one place and not having it propagate across your design. SubViews can be used as filters when you create reports and exports as well. So instead of generating a PDF for everything, just show me what’s in my ‘ABC’ subview. But, I don’t want to do any work! Remember, I’m really lazy. More good news – it’s already done by default! The schemas are automatically used to create default SubViews Auto-Navigate to the Object in the Diagram In the subview tree node, right-click on the object you want to navigate to. You can ask to be taken to the main model view or to the SubView location. If you haven’t already opened the SubView in the diagram, it will be automatically opened for you. The SubView diagram only contains the objects from that SubView Your SubView might still be pretty big, many dozens of objects, so don’t forget about the ‘Navigator‘ either! In summary, use the ‘Import’ feature to add existing database objects to your model. If you import from multiple schemas, take advantage of the default schema based SubViews to help you manage your models! Sometimes less is more!

    Read the article

  • In SSIS Convert European Currency Format to United States Currency Format

    - by Rob
    I have an interesting problem. I have an SSIS package that processes account data. We are now processing files from Europe. These files are in a CSV format using text qualifiers. For an example of the problem: In the United States the currency format is 123456.99 (We purposely leave the thousands separator out). The files sent from Europe are coming in with two formats. One is 123456,99 and the other is 123.456,00. SSIS is attempting to parse the text file and place it into a NUMERIC(20,2) field. This causes a parsing error in SSIS even with the text qualifiers. If I change the field to CURRENCY it sends a conversion error. I would like for SSIS to deal with this directly without requiring the data to be in the United States format. Has anyone had this problem? Any help will be greatly appreciated. Rob

    Read the article

  • MS SQL dts to ssis migration error

    - by Manjot
    Hi, I have migrated some DTS packages to SSIS 2005 using "Migration" wizard. When I tried to run it, it fails saying you need a higher version of SSIS even though the destination SSIS server is on 9.0.4211 level. then I digged in the package using business intelligence studio and saw that one of the package subtasks is "Transform data task" (the dts version) and the package fails to run that. The storage location for this dts task is set to "Embedded in Task". I didn't touch it. why didn't it convert this task to an SSIS data flow task? any help please? Thansk in advance

    Read the article

  • How *not* to handle a compensation step on failure in an SSIS package

    - by James Luetkehoelter
    Just stumbed across this where I'm working. Someone created a global error handler for a package that included this SQL step: DELETE FROM Table WHERE DateDiff(MI, ExportedDate, GetDate()) < 5 So if the package runs for longer than 5 minutes and fails, nothing gets cleaned up. Please people, don't do this... Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!...(read more)

    Read the article

  • SQL Server Training in the UK–SSIS, MDX, Admin, MDS, Internals

    - by simonsabin
    If you are looking for SQL Server training they there is no better place to start than a new company Technitrain Its been setup by a fellow MVP and SQLBits Organiser Chris Webb. Why this company rather than any others? Training based on real world experience by the best in the business. The key to Technitrain’s model is not to cram the shelves high with courses and get some average Joe trainers to deliver them. Technitrain bring in world renowned experts in their fields to deliver courses written...(read more)

    Read the article

  • Consolidating SQL Server Error Logs from Multiple Instances Using SSIS

    SQL Server hides a lot of very useful information in its error log files. Unfortunately, the process of hunting through all these logs, file-by-file, server-by-server, can cause a problem. Rodney Landrum offers a solution which will allow you to pull error log records from multiple servers into a central database, for analysis and reporting with T-SQL.

    Read the article

  • Is SSIS able to query flat files from another Windows Server?

    - by atricapilla
    I pretty new SQL Server Integration Server (SSIS) user. Is SSIS able to query data from text files located in another Windows Server? I mean that when SSIS is installed on Windwos Server A, is SSIS able to query data from e.g. one folder containing text files in Windows Server B (under same domain)? I have used only SAP BO Data Integrator ETL tool and it cannot query flat files from another Server: during execution, all files must be located on the Job Server machine that executes the job.

    Read the article

  • Deploying Data Mining Models using Model Export and Import, Part 2

    - by [email protected]
    In my last post, Deploying Data Mining Models using Model Export and Import, we explored using DBMS_DATA_MINING.EXPORT_MODEL and DBMS_DATA_MINING.IMPORT_MODEL to enable moving a model from one system to another. In this post, we'll look at two distributed scenarios that make use of this capability and a tip for easily moving models from one machine to another using only Oracle Database, not an external file transport mechanism, such as FTP. The first scenario, consider a company with geographically distributed business units, each collecting and managing their data locally for the products they sell. Each business unit has in-house data analysts that build models to predict which products to recommend to customers in their space. A central telemarketing business unit also uses these models to score new customers locally using data collected over the phone. Since the models recommend different products, each customer is scored using each model. This is depicted in Figure 1.Figure 1: Target instance importing multiple remote models for local scoring In the second scenario, consider multiple hospitals that collect data on patients with certain types of cancer. The data collection is standardized, so each hospital collects the same patient demographic and other health / tumor data, along with the clinical diagnosis. Instead of each hospital building it's own models, the data is pooled at a central data analysis lab where a predictive model is built. Once completed, the model is distributed to hospitals, clinics, and doctor offices who can score patient data locally.Figure 2: Multiple target instances importing the same model from a source instance for local scoring Since this blog focuses on model export and import, we'll only discuss what is necessary to move a model from one database to another. Here, we use the package DBMS_FILE_TRANSFER, which can move files between Oracle databases. The script is fairly straightforward, but requires setting up a database link and directory objects. We saw how to create directory objects in the previous post. To create a database link to the source database from the target, we can use, for example: create database link SOURCE1_LINK connect to <schema> identified by <password> using 'SOURCE1'; Note that 'SOURCE1' refers to the service name of the remote database entry in your tnsnames.ora file. From SQL*Plus, first connect to the remote database and export the model. Note that the model_file_name does not include the .dmp extension. This is because export_model appends "01" to this name.  Next, connect to the local database and invoke DBMS_FILE_TRANSFER.GET_FILE and import the model. Note that "01" is eliminated in the target system file name.  connect <source_schema>/<password>@SOURCE1_LINK; BEGIN  DBMS_DATA_MINING.EXPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp',                                 'MY_SOURCE_DIR_OBJECT',                                 'name =''MY_MINING_MODEL'''); END; connect <target_schema>/<password>; BEGIN  DBMS_FILE_TRANSFER.GET_FILE ('MY_SOURCE_DIR_OBJECT',                               'EXPORT_FILE_NAME' || '01.dmp',                               'SOURCE1_LINK',                               'MY_TARGET_DIR_OBJECT',                               'EXPORT_FILE_NAME' || '.dmp' );  DBMS_DATA_MINING.IMPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp',                                 'MY_TARGET_DIR_OBJECT'); END; To clean up afterward, you may want to drop the exported .dmp file at the source and the transferred file at the target. For example, utl_file.fremove('&directory_name', '&model_file_name' || '.dmp');

    Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >