Search Results

Search found 58774 results on 2351 pages for 'ssis data tranformations'.

Page 19/2351 | < Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26 | Next Page >

Subsetting a data frame in a function using another data frame as parameter

- by lecodesportif

I would like to submit a data frame to a function and use it to subset another data frame. This is the basic data frame: foo <- data.frame(var1= c('1', '1', '1', '2', '2', '3'), var2=c('A', 'A', 'B', 'B', 'C', 'C')) I use the following function to find out the frequencies of var2 for specified values of var1. foobar <- function(x, y, z){ a <- subset(x, (x$var1 == y)) b <- subset(a, (a$var2 == z)) n=nrow(b) return(n) } Examples: foobar(foo, 1, "A") # returns 2 foobar(foo, 1, "B") # returns 1 foobar(foo, 3, "C") # returns 1 This works. But now I want to submit a data frame of values to foobar. Instead of the above examples, I would like to submit df to foobar and get the same results as above (2, 1, 1) df <- data.frame(var1=c('1','1','3'), var2=c("A", "B", "C")) When I change foobar to accept two arguments like foobar(foo, df) and use y[, c(var1)] and y[, c(var2)] instead of the two parameters x and y it still doesn't work. Which way is there to do this?

Read the article
Managing Data Dependecies of Java Classes that Load Data from the Classpath at Runtime

- by Martin Potthast

What is the simplest way to manage dependencies of Java classes to data files present in the classpath? More specifically: How should data dependencies be annotated? Perhaps using Java annotations (e.g., @Data)? Or rather some build entries in a build script or a properties file? Is there build tool that integrates and evaluates such information (Ant, Scons, ...)? Do you have examples? Consider the following scenario: A few lines of Ant create a Jar from my sources that includes everything found on the classpath. Then jarjar is used to remove all .class files that are not necessary to execute, say, class Foo. The problem is that all the data files that class Bar depends upon are still there in the Jar. The ideal deployment script, however, would recognize that the data files on which only class Bar depends can be removed while data files on which class Foo depends must be retained. Any hints?

Read the article
A deadlock was detected while trying to lock variables in SSIS

Error: 0xC001405C at SQL Log Status: A deadlock was detected while trying to lock variables "User::RowCount" for read/write access. A lock cannot be acquired after 16 attempts. The locks timed out. Have you ever considered variable locking when building your SSIS packages? I expect many people haven’t just because most of the time you never see an error like the one above. I’ll try and explain a few key concepts about variable locking and hopefully you never will see that error. First of all, what is all this variable locking all about? Put simply SSIS variables have to be locked before they can be accessed, and then of course unlocked once you have finished with them. This is baked into SSIS, presumably to reduce the risk of race conditions, but with that comes some additional overhead in that you need to be careful to avoid lock conflicts in some scenarios. The most obvious place you will come across any hint of locking (no pun intended) is the Script Task or Script Component with their ReadOnlyVariables and ReadWriteVariables properties. These two properties allow you to enter lists of variables to be used within the task, or to put it another way, these lists of variables to be locked, so that they are available within the task. During the task pre-execute phase the variables and locked, you then use them during the execute phase when you code is run, and then unlocked for you during the post-execute phase. So by entering the variable names in one of the two list, the locking is taken care of for you, and you just read and write to the Dts.Variables collection that is exposed in the task for the purpose. As you can see in the image above, the variable PackageInt is specified, which means when I write the code inside that task I don’t have to worry about locking at all, as shown below. public void Main() { // Set the variable value to something new Dts.Variables["PackageInt"].Value = 199; // Raise an event so we can play in the event handler bool fireAgain = true; Dts.Events.FireInformation(0, "Script Task Code", "This is the script task raising an event.", null, 0, ref fireAgain); Dts.TaskResult = (int)ScriptResults.Success; } As you can see as well as accessing the variable, hassle free, I also raise an event. Now consider a scenario where I have an event hander as well as shown below. Now what if my event handler uses tries to use the same variable as well? Well obviously for the point of this post, it fails with the error quoted previously. The reason why is clearly illustrated if you consider the following sequence of events. Package execution starts Script Task in Control Flow starts Script Task in Control Flow locks the PackageInt variable as specified in the ReadWriteVariables property Script Task in Control Flow executes script, and the On Information event is raised The On Information event handler starts Script Task in On Information event handler starts Script Task in On Information event handler attempts to lock the PackageInt variable (for either read or write it doesn’t matter), but will fail because the variable is already locked. The problem is caused by the event handler task trying to use a variable that is already locked by the task in Control Flow. Events are always raised synchronously, therefore the task in Control Flow that is raising the event will not regain control until the event handler has completed, so we really do have un-resolvable locking conflict, better known as a deadlock. In this scenario we can easily resolve the problem by managing the variable locking explicitly in code, so no need to specify anything for the ReadOnlyVariables and ReadWriteVariables properties. public void Main() { // Set the variable value to something new, with explicit lock control Variables lockedVariables = null; Dts.VariableDispenser.LockOneForWrite("PackageInt", ref lockedVariables); lockedVariables["PackageInt"].Value = 199; lockedVariables.Unlock(); // Raise an event so we can play in the event handler bool fireAgain = true; Dts.Events.FireInformation(0, "Script Task Code", "This is the script task raising an event.", null, 0, ref fireAgain); Dts.TaskResult = (int)ScriptResults.Success; } Now the package will execute successfully because the variable lock has already been released by the time the event is raised, so no conflict occurs. For those of you with a SQL Engine background this should all sound strangely familiar, and boils down to getting in and out as fast as you can to reduce the risk of lock contention, be that SQL pages or SSIS variables. Unfortunately we cannot always manage the locking ourselves. The Execute SQL Task is very often used in conjunction with variables, either to pass in parameter values or get results out. Either way the task will manage the locking for you, and will fail when it cannot lock the variables it requires. The scenario outlined above is clear cut deadlock scenario, both parties are waiting on each other, so it is un-resolvable. The mechanism used within SSIS isn’t actually that clever, and whilst the message says it is a deadlock, it really just means it tried a few times, and then gave up. The last part of the error message is actually the most accurate in terms of the failure, A lock cannot be acquired after 16 attempts. The locks timed out. Now this may come across as a recommendation to always manage locking manually in the Script Task or Script Component yourself, but I think that would be an overreaction. It is more of a reminder to be aware that in high concurrency scenarios, especially when sharing variables across multiple objects, locking is important design consideration. Update – Make sure you don’t try and use explicit locking as well as leaving the variable names in the ReadOnlyVariables and ReadWriteVariables lock lists otherwise you’ll get the deadlock error, you cannot lock a variable twice!

Read the article
Best way to implement user-powered data validation

- by vegetables

I run a product recommendation engine and I'm hitting a few snags. I'm looking to see if anyone has any recommendations on what I should do to minimize these issues. Here's how the site works: Users come to the site and are presented with product recommendations based on some criteria. If a user knows of a product that is not in our system, they can add it by providing the product name and manufacturer. We take that information, and: Hit one API to gather all the product meta-data (and to validate the product spelling, etc). If the product is not in this first API, we do not allow it in our system. Use the information from step 1 to hit another API for pricing information (gathered from many places online). For the sake of discussion, assume that I am searching both APIs in the most efficient/successful manner possible. For the most part, this works very well. I'd say ~80% of our data is perfectly accurate, but there are a few issues: Sometimes the pricing API (Step 2) doesn't have any information for the product. The way the pricing API is built, it will always return something (theoretically, the closest possible match), and there's no guarantee that the product name is spelled exactly the same way in both APIs, so there's no automated way of knowing if it's the right product. When the pricing API finds the right product, occasionally it has outdated, or even invalid pricing data (e.g. if it screen-scraped the wrong price from a website). Since the site was fairly small at first, I was able to manually verify every product that was added to the website. However, the site has grown to the point where this is taking several hours per day, and is just not efficient use of my time. So, my question is: Aside from hiring someone (or getting an intern) to validate all the data manually, what would be the best system of letting my userbase self-manage the data. Specifically, how can I allow users to edit the data while minimizing the risk of someone ambushing my website, or accidentally setting the data incorrectly.

Read the article
data handling with javascript

- by Vincent Warmerdam

Python has a very neat package called pandas which allows for quick data transformation; tables, aggregation, that sort of thing. A lot of these types of functionality can also be found in the python itertools module. The plyR package in R is also very similar. Usually one woud use this functionality to produce a table which is later visualized with a plot. I am personally very fond of d3, and I would like to allow the user to first indicate what type of data aggregation he wants on the dataset before it is visualized. The visualisation in question involves making a heatmap where the user gets to select the size of the bins of the heatmap beforehand (I want d3 to project this through leaflet). I want to visually select the ideal size of the bins for the heatmap. The way I work now is that I take the dataset, aggregate it with python and then manually load it in d3. This is a process that takes a lot of human effort and I was wondering if the data aggregation can be done through the javascript of the browser. I couldn't find a package for javascript specifically built for data, suggesting (to me) that this is a bad idea and that one should not use javascript for the data handling. Is there a good module/package for javascript to handle data aggregation? Is it a good/bad idea to do the data aggregation in javascript (performance wise)?

Read the article
Customizing the NUnit GUI for data-driven testing

- by rwong

My test project consists of a set of input data files which is fed into a piece of legacy third-party software. Since the input data files for this software are difficult to construct (not something that can be done intentionally), I am not going to add new input data files. Each input data file will be subject to a set of "test functions". Some of the test functions can be invoked independently. Other test functions represent the stages of a sequential operation - if an earlier stage fails, the subsequent stages do not need to be executed. I have experimented with the NUnit parametrized test case (TestCaseAttribute and TestCaseSourceAttribute), passing in the list of data files as test cases. I am generally satisfied with the the ability to select the input data for testing. However, I would like to see if it is possible to customize its GUI's tree structure, so that the "test functions" become the children of the "input data". For example: File #1 CheckFileTypeTest GetFileTopLevelStructureTest CompleteProcessTest StageOneTest StageTwoTest StageThreeTest File #2 CheckFileTypeTest GetFileTopLevelStructureTest CompleteProcessTest StageOneTest StageTwoTest StageThreeTest This will be useful for identifying the stage that failed during the processing of a particular input file. Is there any tips and tricks that will enable the new tree layout? Do I need to customize NUnit to get this layout?

Read the article
Test Data in a Distributed System

- by Davin Tryon

A question that has been vexing me lately has been about how to effectively test (end-to-end) features in a distributed system. Particuarly, how to effectively manage (through time) test data for feature testing. The system in question is a typical SOA setup. The composition is done in JavaScript when call to several REST APIs. Each service is built as an independent block. Each service has some kind of persistent storage (SQL Server in most cases). The main issue at the moment is how to approach test data when testing end-to-end features. Functional end-to-end testing occurs through the UI, and it is therefore necessary for test data to be set up before the test run (this could be manual or automated testing). As is typical in a distributed system, identifiers from one service are used as a link in another service. So, some level of synchronization needs to be present in the data to effectively test. What is the best way to manage and set up this data after a successful deployment to a test environment? For example, is it better to manage this test data inside each service? Or package it together with the testing suite? Does that testing suite exist as a separate project? I'm interested in design guidance about how to store and manage this test data as the application features evolve.

Read the article
Data Management Business Continuity Planning

Business Continuity Governance In order to ensure data continuity for an organization, they need to ensure they know how to handle a data or network emergency because all systems have the potential to fail. Data Continuity Checklist: Disaster Recovery Plan/Policy Backups Redundancy Trained Staff Business Continuity Policies In order to protect data in case of any emergency a company needs to put in place a Disaster recovery plan and policies that can be executed by IT staff to ensure the continuity of the existing data and/or limit the amount of data that is not contiguous. A disaster recovery plan is a comprehensive statement of consistent actions to be taken before, during and after a disaster, according to Geoffrey H. Wold. He also states that the primary objective of disaster recovery planning is to protect the organization in the event that all or parts of its operations and/or computer services are rendered unusable. Furthermore, companies can mandate through policies that IT must maintain redundant hardware in case of any hardware failures and redundant network connectivity incase the primary internet service provider goes down. Additionally, they can require that all staff be trained in regards to the Disaster recovery policy to ensure that all parties evolved are knowledgeable to execute the recovery plan. Business Continuity Procedures Business continuity procedure vary from organization to origination, however there are standard procedures that most originations should follow. Standard Business Continuity Procedures Backup and Test Backups to ensure that they work Hire knowledgeable and trainable staff Offer training on new and existing systems Regularly monitor, test, maintain, and upgrade existing system hardware and applications Maintain redundancy regarding all data, and critical business functionality

Read the article
How to choose how to store data?

- by Eldros

Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime. - Chinese Proverb I could ask what kind of data storage I should use for my actual project, but I want to learn to fish, so I don't need to ask for a fish each time I begin a new project. So, until I used two methods to store data on my non-game project: XML files, and relational databases. I know that there is also other kind of database, of the NoSQL kind. However I wouldn't know if there is more choice available to me, or how to choose in the first place, aside arbitrary picking one. So the question is the following: How should I choose the kind of data storage for a game project? And I would be interested on the following criterion when choosing: The size of the project. The platform targeted by the game. The complexity of the data structure. Added Portability of data amongst many project. Added How often should the data be accessed Added Multiple type of data for a same application Any other point you think is of interest when deciding what to use. EDIT I know about Would it be better to use XML/JSON/Text or a database to store game content?, but thought it didn't address exactly my point. Now if I am wrong, I would gladely be shown the error in my ways.

Read the article
Data Masking Pack 12.1.0.3 Certified with E-Business Suite 12.1.3

- by Elke Phelps (Oracle Development)

I'm pleased to announce the certification of the E-Business Suite 12.1.3 Data Masking Template for the Data Masking Pack with Enterprise Manager Cloud Control 12.1.0.3. You can use the Oracle Data Masking Pack with Oracle Enterprise Manager Grid Control 12c to scramble sensitive data in cloned E-Business Suite environments. You may scramble data in E-Business Suite cloned environments with EM12.1.0.3 using the following template: E-Business Suite 12.1.3 Data Masking Template for Data Masking Pack with EM12c (Patch 18462641) What does data masking do in E-Business Suite environments? Application data masking does the following: De-identify the data: Scramble identifiers of individuals, also known as personally identifiable information or PII. Examples include information such as name, account, address, location, and driver's license number. Mask sensitive data: Mask data that, if associated with personally identifiable information (PII), would cause privacy concerns. Examples include compensation, health and employment information. Maintain data validity: Provide a fully functional application. How can EBS customers use data masking? The Oracle E-Business Suite Template for Data Masking Pack can be used in situations where confidential or regulated data needs to be shared with other non-production users who need access to some of the original data, but not necessarily every table. Examples of non-production users include internal application developers or external business partners such as offshore testing companies, suppliers or customers. Due to data dependencies, scrambling E-Business Suite data is not a trivial task. The data needs to be scrubbed in such a way that allows the application to continue to function. The template works with the Oracle Data Masking Pack and Oracle Enterprise Manager to obscure sensitive E-Business Suite information that is copied from production to non-production environments. The Oracle E-Business Suite Template for Data Masking Pack is applied to a non-production environment with the Enterprise Manager Grid Control Data Masking Pack. When applied, the Oracle E-Business Suite Template for Data Masking Pack will create an irreversibly scrambled version of your production database for development and testing. Is there a charge for this? Yes. You must purchase licenses for the Oracle Data Masking Pack to use the Oracle E-Business Suite 12.1.3 template. The Oracle E-Business Suite 12.1.3 Template for the Data Masking Pack is included with the Oracle Data Masking Pack license. You can contact your Oracle account manager for more details about licensing. References Additional details and requirements are provided in the following My Oracle Support Note: Using Oracle E-Business Suite Release 12.1.3 Template for the Data Masking Pack with Oracle Enterprise Manager 12.1 Data Masking Tool (Note 1481916.1) Masking Sensitive Data in the Oracle Database Real Application Testing User's Guide 11g Release 2 (11.2) Related Articles Scrambling Sensitive Data in E-Business Suite E-Business Suite 12.1.3 Data Masking Certified with Enterprise Manager 12c

Read the article
Data Profiling without SSIS

Strangely enough for a predominantly SSIS blog, this post is all about how to perform data profiling without using SSIS. Whilst the Data Profiling Task is a worthy addition, there are a couple of limitations I’ve encountered of late. The first is that it requires SQL Server 2008, and not everyone is there yet. The second is that it can only target SQL Server 2005 and above. What about older systems, which are the ones that we probably need to investigate the most, or other vendor databases such as Oracle? With these limitations in mind I did some searching to find a quick and easy alternative to help me perform some data profiling for a project I was working on recently. I only had SQL Server 2005 available, and anyway most of my target source systems were Oracle, and of course I had short timescales. I looked at several options. Some never got beyond the download stage, they failed to install or just did not run, and others provided less than I could have produced myself by spending 2 minutes writing some basic SQL queries. In the end I settled on an open source product called DataCleaner. To quote from their website: DataCleaner is an Open Source application for profiling, validating and comparing data. These activities help you administer and monitor your data quality in order to ensure that your data is useful and applicable to your business situation. DataCleaner is the free alternative to software for master data management (MDM) methodologies, data warehousing (DW) projects, statistical research, preparation for extract-transform-load (ETL) activities and more. DataCleaner is developed in Java and licensed under LGPL. As quoted above it claims to support profiling, validating and comparing data, but I didn’t really get past the profiling functions, so won’t comment on the other two. The profiling whilst not prefect certainly saved some time compared to the limited alternatives. The ability to profile heterogeneous data sources is a big advantage over the SSIS option, and I found it overall quite easy to use and performance was good. I could see it struggling at times, but actually for what it does I was impressed. It had some data type niggles with Oracle, and some metrics seem a little strange, although thankfully they were easy to augment with some SQL queries to ensure a consistent picture. The report export options didn’t do it for me, but copy and paste with a bit of Excel magic was sufficient. One initial point for me personally is that I have had limited exposure to things of the Java persuasion and whilst I normally get by fine, sometimes the simplest things can throw me. For example installing a JDBC driver, why do I have to copy files to make it all work, has nobody ever heard of an MSI? In case there are other people out there like me who have become totally indoctrinated with the Microsoft software paradigm, I’ve written a quick start guide that details every step required. Steps 1- 5 are the key ones, the rest is really an excuse for some screenshots to show you the tool. Quick Start Guide Step 1 - Download Data Cleaner. The Microsoft Windows zipped exe option, and I chose the latest stable build, currently DataCleaner 1.5.3 (final). Extract the files to a suitable location. Step 2 - Download Java. If you try and run datacleaner.exe without Java it will warn you, and then open your default browser and take you to the Java download site. Follow the installation instructions from there, normally just click Download Java a couple of times and you’re done. Step 3 - Download Microsoft SQL Server JDBC Driver. You may have SQL Server installed, but you won’t have a JDBC driver. Version 3.0 is the latest as of April 2010. There is no real installer, we are in the Java world here, but run the exe you downloaded to extract the files. The default Unzip to folder is not much help, so try a fully qualified path such as C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\ to ensure you can find the files afterwards. Step 4 - If you wish to use Windows Authentication to connect to your SQL Server then first we need to copy a file so that Data Cleaner can find it. Browse to the JDBC extract location from Step 3 and drill down to the file sqljdbc_auth.dll. You will have to choose the correct directory for your processor architecture. e.g. C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\sqljdbc_3.0\enu\auth\x86\sqljdbc_auth.dll. Now copy this file to the Data Cleaner extract folder you chose in Step 1. An alternative method is to edit datacleaner.cmd in the data cleaner extract folder as detailed in this data cleaner wiki topic, but I find copying the file simpler. Step 5 – Now lets run Data Cleaner, just run datacleaner.exe from the extract folder you chose in Step 1. Step 6 – Complete or skip the registration screen, and ignore the task window for now. In the main window click settings. Step 7 – In the Settings dialog, select the Database drivers tab, then click Register database driver and select the Local JAR file option. Step 8 – Browse to the JDBC driver extract location from Step 3 and drill down to select sqljdbc4.jar. e.g. C:\Program Files\Microsoft SQL Server JDBC Driver 3.0\sqljdbc_3.0\enu\sqljdbc4.jar Step 9 – Select the Database driver class as com.microsoft.sqlserver.jdbc.SQLServerDriver, and then click the Test and Save database driver button. Step 10 - You should be back at the Settings dialog with a the list of drivers that includes SQL Server. Just click Save Settings to persist all your hard work. Step 11 – Now we can start to profile some data. In the main Data Cleaner window click New Task, and then Profile from the task window. Step 12 – In the Profile window click Open Database Step 13 – Now choose the SQL Server connection string option. Selecting a connection string gives us a template like jdbc:sqlserver://<hostname>:1433;databaseName=<database>, but obviously it requires some details to be entered for example jdbc:sqlserver://localhost:1433;databaseName=SQLBits. This will connect to the database called SQLBits on my local machine. The port may also have to be changed if using such as when you have a multiple instances of SQL Server running. If using SQL Server Authentication enter a username and password as required and then click Connect to database. You can use Window Authentication, just add integratedSecurity=true to the end of your connection string. e.g jdbc:sqlserver://localhost:1433;databaseName=SQLBits;integratedSecurity=true. If you didn’t complete Step 4 above you will need to do so now and restart Data Cleaner before it will work. Manually setting the connection string is fine, but creating a named connection makes more sense if you will be spending any length of time profiling a specific database. As highlighted in the left-hand screen-shot, at the bottom of the dialog it includes partial instructions on how to create named connections. In the folder shown C:\Users\<Username>\.datacleaner\1.5.3, open the datacleaner-config.xml file in your editor of choice add your own details. You’ll see a sample connection in the file already, just add yours following the same pattern. e.g.  <bean class="dk.eobjects.datacleaner.gui.model.NamedConnection"> <property name="name" value="SQLBits Local Connection" /> <property name="driverClass" value="com.microsoft.sqlserver.jdbc.SQLServerDriver" /> <property name="connectionString" value="jdbc:sqlserver://localhost:1433;databaseName=SQLBits;integratedSecurity=true" /> <property name="tableTypes"> <list> <value>TABLE</value> <value>VIEW</value> </list> </property> </bean> Step 14 – Once back at the Profile window, you should now see your schemas, tables and/or views listed down the left hand side. Browse this tree and double-click a table to select it for profiling. You can then click Add profile, and choose some profiling options, before finally clicking Run profiling. You can see below a sample output for three of the most common profiles, click the image for full size. I hope this has given you a taster for DataCleaner, and should help you get up and running pretty quickly.

Read the article
SQL SERVER – Introduction to Adaptive ETL Tool – How adaptive is your ETL?

- by pinaldave

I am often reminded by the fact that BI/data warehousing infrastructure is very brittle and not very adaptive to change. There are lots of basic use cases where data needs to be frequently loaded into SQL Server or another database. What I have found is that as long as the sources and targets stay the same, SSIS or any other ETL tool for that matter does a pretty good job handling these types of scenarios. But what happens when you are faced with more challenging scenarios, where the data formats and possibly the data types of the source data are changing from customer to customer? Let’s examine a real life situation where a health management company receives claims data from their customers in various source formats. Even though this company supplied all their customers with the same claims forms, they ended up building one-off ETL applications to process the claims for each customer. Why, you ask? Well, it turned out that the claims data from various regional hospitals they needed to process had slightly different data formats, e.g. “integer” versus “string” data field definitions. Moreover the data itself was represented with slight nuances, e.g. “0001124” or “1124” or “0000001124” to represent a particular account number, which forced them, as I eluded above, to build new ETL processes for each customer in order to overcome the inconsistencies in the various claims forms. As a result, they experienced a lot of redundancy in these ETL processes and recognized quickly that their system would become more difficult to maintain over time. So imagine for a moment that you could use an ETL tool that helps you abstract the data formats so that your ETL transformation process becomes more reusable. Imagine that one claims form represents a data item as a string – acc_no(varchar) – while a second claims form represents the same data item as an integer – account_no(integer). This would break your traditional ETL process as the data mappings are hard-wired. But in a world of abstracted definitions, all you need to do is create parallel data mappings to a common data representation used within your ETL application; that is, map both external data fields to a common attribute whose name and type remain unchanged within the application. acc_no(varchar) is mapped to account_number(integer) expressor Studio first claim form schema mapping account_no(integer) is also mapped to account_number(integer) expressor Studio second claim form schema mapping All the data processing logic that follows manipulates the data as an integer value named account_number. Well, these are the kind of problems that that the expressor data integration solution automates for you. I’ve been following them since last year and encourage you to check them out by downloading their free expressor Studio ETL software. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Business Intelligence, Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: ETL, SSIS

Read the article
SSIS script task pre-compile script into binary code set to false

- by Pramodtech

If I set pre-compile script into binary code to true I get error saying "The task is configured to pre-compile the script, but binary code is not found." If I set this property to False then it works. Will it be a problem after I deploy package on production server? Please advice.

Read the article
SSIS 2005 - How to get the timestamp properties for a file

- by GordyII

How do you get a timestamp from a file with SQL Server Intergration Services 2005?

Read the article
SSIS - How to access a RecordSet variable inside a Script Task

- by GordyII

How do you access a RecordSet variable inside a Script Task?

Read the article
SSIS For Each File Loop and File System Task to copy Files

- by Marlon

I'm using a files system task inside a for each loop container, just as described here: link text However, when I execute the package I get this error: [File System Task] Error: An error occurred with the following error message: "The process cannot access the file 'C:\Book1.xlsx' because it is being used by another process.". I do not have the file open, and I assume no one else does, as I am able to copy, and open, and overwrite the file. Any suggestions would be appreciated. If you want an example package plz let me know.

Read the article
alternatives to SSIS

- by Johannes Rudolph

What is a good,stable and preferably free alternative to SQL Server Integration Services? I'm so tired of this buggy piece of software.

Read the article
Oracle Data Warehouse and Big Data Magazine MAY Edition for Customers + Partners

- by KLaker

Follow us on The latest edition of our monthly data warehouse and big data magazine for Oracle customers and partners is now available. The content for this magazine is taken from the various data warehouse and big data Oracle product management blogs, Oracle press releases, videos posted on Oracle Media Network and Oracle Facebook pages. Click here to view the May Edition Please share this link http://flip.it/fKOUS to our magazine with your customers and partners This magazine is optimized for display on tablets and smartphones using the Flipboard App which is available from the Apple App store and Google Play store

Read the article
The perils of double-dash comments [T-SQL]

- by jamiet

I was checking my Twitter feed on my way in to work this morning and was alerted to an interesting blog post by Valentino Vranken that highlights a problem regarding the OLE DB Source in SSIS. In short, using double-dash comments in SQL statements within the OLE DB Source can cause unexpected results. It really is quite an important read if you’re developing SSIS packages so head over to SSIS OLE DB Source, Parameters And Comments: A Dangerous Mix! and be educated. Note that the problem is solved in SSIS2012 and Valentino explains exactly why. If reading Valentino’s post has switched your brain into “learn mode” perhaps also check out my post SSIS: SELECT *... or select from a dropdown in an OLE DB Source component? which highlights another issue to be aware of when using the OLE DB Source. As I was reading Valentino’s post I was reminded of a slidedeck by Chris Adkin entitled T-SQL Coding Guidelines where he recommends never using double-dash comments: That’s good advice! @Jamiet

Read the article
Version control and data provenance in charts, slides, and marketing materials that derive from code ouput

- by EMS

I develop as part of a small team that mostly does research and statistics stuff. But from the output of our code, other teams often create promotional materials, slides, presentations, etc. We run into a big problem because the marketing team (non-programmers) tend to use Excel, Adobe products, or other tools to carry out their work, and just want easy-to-use data formats from us. This leads to data provenance problems. We see email chains with attachments from 6 months ago and someone is saying "Hey, who generated this data. Can you generate more of it with the recent 6 months of results added in?" I want to help the other teams effectively use version control (my team uses it reasonably well for the code, but every other team classically comes up with many excuses to avoid it). For version controlling a software project where the participants are coders, I have some reasonable understanding of best practices and what to do. But for getting a team of marketing professionals to version control marketing materials and associate metadata about the software used to generate the data for the charts, I'm a bit at a loss. Some of the goals I'd like to achieve: Data that supported a material should never be associated with a person. As in, it should never be the case that someone says "Hey Person XYZ, I see you sent me this data as an attachment 6 months ago, can you update it for me?" Rather, data should be associated with the code and code-version of any code that was used to get it, and perhaps a team of many people who may maintain that code. Then references for data updates are about executing a specific piece of code, with a known version number. I'd like this to be a process that works easily with the tech that the marketing team already uses (e.g. Excel files, Adobe file, whatever). I don't want to burden them with needing to learn a bunch of new stuff just to use version control. They are capable folks, so learning something is fine. Ideally they could use our existing version control framework, but there are some issues around that. I think knowing some general best practices will be enough though, and I can handle patching that into the way our stuff works now. Are there any goals I am failing to think about? What are the time-tested ways to do something like this?

Read the article
Presenting at Roanoke Code Camp Saturday!

- by andyleonard

Introduction I am honored to once again be selected to present at Roanoke Code Camp ! An Introductory Topic One of my presentations is titled "I See a Control Flow Tab. Now What?" It's a Level 100 talk for those wishing to learn how to build their very first SSIS package. This highly-interactive, demo-intense presentation is for beginners and developers just getting started with SSIS. Attend and learn how to build SSIS packages from the ground up . Designing an SSIS Framework I'm also presenting...(read more)

Read the article
What does it mean to treat data as an asset?

What does it mean to treat data as an asset? When considering this concept, we must define what data is and how it can be considered an asset. Data can easily be defined as a collection of stored truths that are open to interpretation and manipulation. Expanding on this definition, data can be viewed as a set of captured facts, measurements, and ideas used to make decisions. Furthermore, InvestorsWords.com defines asset as any item of economic value owned by an individual or corporation. Now let’s apply this definition of asset to our definition of data, and ask the following question. Can facts, measurements and ideas be items that are of economic value owned by an individual or corporation? The obvious answer is yes; data can be bought and sold like commodities or analyzed to make smarter business decisions. We can look at the economic value of data in one of two ways. First, data can be sold as a commodity that can take the form of goods like eBooks, Training, Music, Movies, and so on. Customers are willing to pay to gain access to this data for their consumption. This directly implies that there is an economic value for data in the form of a commodity because customers see a value in obtaining it. Secondly data can be used in making smarter business decisions that allow for companies to become more profitable and/or reduce their potential for risk in regards to how they operate. In the past I have worked at companies where we had to analyze previous sales activities in conjunction with current activities to determine how the company was preforming for the quarter. In addition trends can be formulated based on existing data that allow companies to forecast data so that they can make strategic business decisions based sound forecasted data. Companies that truly value their data are constantly trying to grow and upgrade their data and supporting applications because it is the life blood of a company. If we look at an eBook retailer for example, imagine if they lost all of their data. They would be in essence forced out of business because they would have nothing to sell. In turn, if we look at a company that was using data to facilitate better decision making processes and they lost all of their data then they could be losing potential revenue and/ or increasing the company’s losses by making important business decisions virtually in the dark compared to when they were made on solid data.

Read the article
SSISDB Analysis Script on Gist

- by Davide Mauri

I've created two simple, yet very useful, script to extract some useful data to quickly monitor SSIS packages execution in SQL Server 2012 and after.get-ssis-execution-status get-ssis-data-pumped-rows I've started to use gist since it comes very handy, for this "quick'n'dirty" scripts and snippets, and you can find the above scripts and others (hopefully the number will increase over time...I plan to use gist to store all the code snippet I used to store in a dedicated folder on my machine) there.Now, back to the aforementioned scripts. The first one ("get-ssis-execution-status") returns a list of all executed and executing packages along with latest successful and running executions (so that on can have an idea of the expected run time)error messageswarning messages related to duplicate rows found in lookupsthe second one ("get-ssis-data-pumped-rows") returns information on DataFlows status. Here there's something interesting, IMHO. Nothing exceptional, let it be clear, but nonetheless useful: the script extract information on destinations and row sent to destinations right from the messages produced by the DataFlow component. This helps to quickly understand how many rows as been sent and where...without having to increase the logging level.Enjoy! PSI haven't tested it with SQL Server 2014, but AFAIK they should work without problems. Of course any feedback on this is welcome.

Read the article
How to stop ADO Data Service from sending 'MetaData' with the Data

- by ActionFactory

I'm using MS Data Service to send JSON. (Using a View in the DB) Problem is that it sends "__metadata": { "uri":"ALL THE DATA REPEATED"} with the Data - each time. This doubles our Data over the wire each call. THERE MUST BE A WAY TO TURN THIS OFF!!! WHO KNOWS HOW? THX

Read the article
Working with Spatial Data Part II - Plotting Shapes and Storing Geocoded Data

Continuing from Part I of the Spatial Data series we model an in-memory/persistent data repository for storing geocoded data and plot the data.

Read the article

Search Results

Search found 58774 results on 2351 pages for 'ssis data tranformations'.

Page 19/2351 | < Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26 | Next Page >

- by lecodesportif

- by Martin Potthast

- by vegetables

- by Vincent Warmerdam

- by rwong

- by Davin Tryon

- by Eldros

- by Elke Phelps (Oracle Development)

- by pinaldave

- by Pramodtech

- by GordyII

- by GordyII

- by Marlon

- by Johannes Rudolph

- by KLaker

- by jamiet

- by EMS

- by andyleonard

- by Davide Mauri

- by ActionFactory

< Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26 | Next Page >