Search Results

Search found 58965 results on 2359 pages for 'ssis data transformations'.

Page 21/2359 | < Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28 | Next Page >

Return Integer value from SSIS execute SQL Task

- by Bokhari

I am using SQL Server 2005 Business Intelligence Studio and struggling with returning an integer value from a very simple execute SQL Task. For a very simple test, I wrote the SQL Statement as: Select 35 As 'TotalRecords' Then, I specified ResultSet as ResultName = TotalRecords and VariableName = User::TotalRecords When I execute this, the statement is executed but the variable doesn't have the updated value. However, it has the default value that I specified while variable definition. The return of a date variable works, but integer variable isn't working. The type of User::TotalRecords specified is Int32 in a package scope. Thanks for any hints

Read the article
CSV is not creating format problems with integer value when generated through SSIS package

- by paranjai

Hi I am trying to generate a CSV file from DB Query as source one of my column is having datatype nvarchar(50) with values as "01050007029604301001" After the export when the csv file is viewed using Excel the value appears as "1.0500E18" . How can i stop this . Please suggest

Read the article
SSIS Script Component + Helper Assemblies (.dll's)

- by Nev_Rahd

I got a script component which does Transformation / DataType conversions / Creating some calculated columns. All the transform validations / datatype conversion methods and for new column generation is put into custom .dll. As this script component would be same for all other tables, only thing is to define input / ouput columns and apply validation methods on required columns. This all works fine. On production server where do I need to deploy my .dll. Would just putting it into GAC will be enough or need to do something else. Regards

Read the article
Exception Handling in SSIS Script Task

- by Sreejesh Kumar

Is it possible to redirect the Exceptions occurred to another table/log, in Script Task ? If so, How is it to be done ?

Read the article
Implementing a generic repository for WCF data services

- by cibrax

The repository implementation I am going to discuss here is not exactly what someone would call repository in terms of DDD, but it is an abstraction layer that becomes handy at the moment of unit testing the code around this repository. In other words, you can easily create a mock to replace the real repository implementation. The WCF Data Services update for .NET 3.5 introduced a nice feature to support two way data bindings, which is very helpful for developing WPF or Silverlight based application but also for implementing the repository I am going to talk about. As part of this feature, the WCF Data Services Client library introduced a new collection DataServiceCollection<T> that implements INotifyPropertyChanged to notify the data context (DataServiceContext) about any change in the association links. This means that it is not longer necessary to manually set or remove the links in the data context when an item is added or removed from a collection. Before having this new collection, you basically used the following code to add a new item to a collection. Order order = new Order { Name = "Foo" }; OrderItem item = new OrderItem { Name = "bar", UnitPrice = 10, Qty = 1 }; var context = new OrderContext(); context.AddToOrders(order); context.AddToOrderItems(item); context.SetLink(item, "Order", order); context.SaveChanges(); Now, thanks to this new collection, everything is much simpler and similar to what you have in other ORMs like Entity Framework or L2S. Order order = new Order { Name = "Foo" }; OrderItem item = new OrderItem { Name = "bar", UnitPrice = 10, Qty = 1 }; order.Items.Add(item); var context = new OrderContext(); context.AddToOrders(order); context.SaveChanges(); In order to use this new feature, you first need to enable V2 in the data service, and then use some specific arguments in the datasvcutil tool (You can find more information about this new feature and how to use it in this post). DataSvcUtil /uri:"http://localhost:3655/MyDataService.svc/" /out:Reference.cs /dataservicecollection /version:2.0 Once you use those two arguments, the generated proxy classes will use DataServiceCollection<T> rather than a simple ObjectCollection<T>, which was the default collection in V1. There are some aspects that you need to know to use this feature correctly. 1. All the entities retrieved directly from the data context with a query track the changes and report those to the data context automatically. 2. A entity created with “new” does not track any change in the properties or associations. In order to enable change tracking in this entity, you need to do the following trick. public Order CreateOrder() { var collection = new DataServiceCollection<Order>(this.context); var order = new Order(); collection.Add(order); return order; } You basically need to create a collection, and add the entity to that collection with the “Add” method to enable change tracking on that entity. 3. If you need to attach an existing entity (For example, if you created the entity with the “new” operator rather than retrieving it from the data context with a query) to a data context for tracking changes, you can use the “Load” method in the DataServiceCollection. var order = new Order { Id = 1 }; var collection = new DataServiceCollection<Order>(this.context); collection.Load(order); In this case, the order with Id = 1 must exist on the data source exposed by the Data service. Otherwise, you will get an error because the entity did not exist. These cool extensions methods discussed by Stuart Leeks in this post to replace all the magic strings in the “Expand” operation with Expression Trees represent another feature I am going to use to implement this generic repository. Thanks to these extension methods, you could replace the following query with magic strings by a piece of code that only uses expressions. Magic strings, var customers = dataContext.Customers .Expand("Orders") .Expand("Orders/Items") Expressions, var customers = dataContext.Customers .Expand(c => c.Orders.SubExpand(o => o.Items)) That query basically returns all the customers with their orders and order items. Ok, now that we have the automatic change tracking support and the expression support for explicitly loading entity associations, we are ready to create the repository. The interface for this repository looks like this,public interface IRepository { T Create<T>() where T : new(); void Update<T>(T entity); void Delete<T>(T entity); IQueryable<T> RetrieveAll<T>(params Expression<Func<T, object>>[] eagerProperties); IQueryable<T> Retrieve<T>(Expression<Func<T, bool>> predicate, params Expression<Func<T, object>>[] eagerProperties); void Attach<T>(T entity); void SaveChanges(); } The Retrieve and RetrieveAll methods are used to execute queries against the data service context. While both methods receive an array of expressions to load associations explicitly, only the Retrieve method receives a predicate representing the “where” clause. The following code represents the final implementation of this repository.public class DataServiceRepository: IRepository { ResourceRepositoryContext context; public DataServiceRepository() : this (new DataServiceContext()) { } public DataServiceRepository(DataServiceContext context) { this.context = context; } private static string ResolveEntitySet(Type type) { var entitySetAttribute = (EntitySetAttribute)type.GetCustomAttributes(typeof(EntitySetAttribute), true).FirstOrDefault(); if (entitySetAttribute != null) return entitySetAttribute.EntitySet; return null; } public T Create<T>() where T : new() { var collection = new DataServiceCollection<T>(this.context); var entity = new T(); collection.Add(entity); return entity; } public void Update<T>(T entity) { this.context.UpdateObject(entity); } public void Delete<T>(T entity) { this.context.DeleteObject(entity); } public void Attach<T>(T entity) { var collection = new DataServiceCollection<T>(this.context); collection.Load(entity); } public IQueryable<T> Retrieve<T>(Expression<Func<T, bool>> predicate, params Expression<Func<T, object>>[] eagerProperties) { var entitySet = ResolveEntitySet(typeof(T)); var query = context.CreateQuery<T>(entitySet); foreach (var e in eagerProperties) { query = query.Expand(e); } return query.Where(predicate); } public IQueryable<T> RetrieveAll<T>(params Expression<Func<T, object>>[] eagerProperties) { var entitySet = ResolveEntitySet(typeof(T)); var query = context.CreateQuery<T>(entitySet); foreach (var e in eagerProperties) { query = query.Expand(e); } return query; } public void SaveChanges() { this.context.SaveChanges(SaveChangesOptions.Batch); } } For instance, you can use the following code to retrieve customers with First name equal to “John”, and all their orders in a single call. repository.Retrieve<Customer>( c => c.FirstName == “John”, //Where c => c.Orders.SubExpand(o => o.Items)); In case, you want to have some pre-defined queries that you are going to use across several places, you can put them in an specific class. public static class CustomerQueries { public static Expression<Func<Customer, bool>> LastNameEqualsTo(string lastName) { return c => c.LastName == lastName; } } And then, use it with the repository. repository.Retrieve<Customer>( CustomerQueries.LastNameEqualsTo("foo"), c => c.Orders.SubExpand(o => o.Items));

Read the article
data structure for counting frequencies in a database table-like format

- by user373312

i was wondering if there is a data structure optimized to count frequencies against data that is stored in a database table-like format. for example, the data comes in a (comma) delimited format below. col1, col2, col3 x, a, green x, b, blue ... y, c, green now i simply want to count the frequency of col1=x or col1=x and col2=green. i have been storing the data in a database table, but in my profiling and from empirical observation, database connection is the bottle-neck. i have tried using in-memory database solutions too, and that works quite well; the only problem is memory requirements and quirky init/destroy calls. also, i work mainly with java, but have experience with .net, and was wondering if there was any api to work with "tabular" data in a linq way using java. any help is appreciated.

Read the article
Oracle Big Data Software Downloads

- by Mike.Hallett(at)Oracle-BI&EPM

Companies have been making business decisions for decades based on transactional data stored in relational databases. Beyond that critical data, is a potential treasure trove of less structured data: weblogs, social media, email, sensors, and photographs that can be mined for useful information. Oracle offers a broad integrated portfolio of products to help you acquire and organize these diverse data sources and analyze them alongside your existing data to find new insights and capitalize on hidden relationships. Oracle Big Data Connectors Downloads here, includes: Oracle SQL Connector for Hadoop Distributed File System Release 2.1.0 Oracle Loader for Hadoop Release 2.1.0 Oracle Data Integrator Companion 11g Oracle R Connector for Hadoop v 2.1 Oracle Big Data Documentation The Oracle Big Data solution offers an integrated portfolio of products to help you organize and analyze your diverse data sources alongside your existing data to find new insights and capitalize on hidden relationships. Oracle Big Data, Release 2.2.0 - E41604_01 zip (27.4 MB) Integrated Software and Big Data Connectors User's Guide HTML PDF Oracle Data Integrator (ODI) Application Adapter for Hadoop Apache Hadoop is designed to handle and process data that is typically from data sources that are non-relational and data volumes that are beyond what is handled by relational databases. Typical processing in Hadoop includes data validation and transformations that are programmed as MapReduce jobs. Designing and implementing a MapReduce job usually requires expert programming knowledge. However, when you use Oracle Data Integrator with the Application Adapter for Hadoop, you do not need to write MapReduce jobs. Oracle Data Integrator uses Hive and the Hive Query Language (HiveQL), a SQL-like language for implementing MapReduce jobs. Employing familiar and easy-to-use tools and pre-configured knowledge modules (KMs), the application adapter provides the following capabilities: Loading data into Hadoop from the local file system and HDFS Performing validation and transformation of data within Hadoop Loading processed data from Hadoop to an Oracle database for further processing and generating reports Oracle Database Loader for Hadoop Oracle Loader for Hadoop is an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database. It pre-partitions the data if necessary and transforms it into a database-ready format. Oracle Loader for Hadoop is a Java MapReduce application that balances the data across reducers to help maximize performance. Oracle R Connector for Hadoop Oracle R Connector for Hadoop is a collection of R packages that provide: Interfaces to work with Hive tables, the Apache Hadoop compute infrastructure, the local R environment, and Oracle database tables Predictive analytic techniques, written in R or Java as Hadoop MapReduce jobs, that can be applied to data in HDFS files You install and load this package as you would any other R package. Using simple R functions, you can perform tasks such as: Access and transform HDFS data using a Hive-enabled transparency layer Use the R language for writing mappers and reducers Copy data between R memory, the local file system, HDFS, Hive, and Oracle databases Schedule R programs to execute as Hadoop MapReduce jobs and return the results to any of those locations Oracle SQL Connector for Hadoop Distributed File System Using Oracle SQL Connector for HDFS, you can use an Oracle Database to access and analyze data residing in Hadoop in these formats: Data Pump files in HDFS Delimited text files in HDFS Hive tables For other file formats, such as JSON files, you can stage the input in Hive tables before using Oracle SQL Connector for HDFS. Oracle SQL Connector for HDFS uses external tables to provide Oracle Database with read access to Hive tables, and to delimited text files and Data Pump files in HDFS. Related Documentation Cloudera's Distribution Including Apache Hadoop Library HTML Oracle R Enterprise HTML Oracle NoSQL Database HTML Recent Blog Posts Big Data Appliance vs. DIY Price Comparison Big Data: Architecture Overview Big Data: Achieve the Impossible in Real-Time Big Data: Vertical Behavioral Analytics Big Data: In-Memory MapReduce Flume and Hive for Log Analytics Building Workflows in Oozie

Read the article
The Ins and Outs of Effective Smart Grid Data Management

- by caroline.yu

Oracle Utilities and Accenture recently sponsored a one-hour Web cast entitled, "The Ins and Outs of Effective Smart Grid Data Management." Oracle and Accenture created this Web cast to help utilities better understand the types of data collected over smart grid networks and the issues associated with mapping out a coherent information management strategy. The Web cast also addressed important points that utilities must consider with the imminent flood of data that both present and next-generation smart grid components will generate. The three speakers, including Oracle Utilities' Brad Williams, focused on the key factors associated with taking the millions of data points captured in real time and implementing the strategies, frameworks and technologies that enable utilities to process, store, analyze, visualize, integrate, transport and transform data into the information required to deliver targeted business benefits. The Web cast replay is available here. The Web cast slides are available here.

Read the article
What Works in Data Integration?

- by dain.hansen

TDWI just recently put out this paper on "What Works in Data Integration". I invite you especially to take a look at the section on "Accelerating your Business with Real-time Data Integration" and the DIRECTV case study. The article discusses some of the technology considerations for BI/DW and how data integration plays a role to deliver timely, accessible, and high-quality data. It goes on to outline the three key requirements for how to deliver high performance, low impact, and reliability and how that can translate to faster results. The DIRECTV webinar is something you definitely want to take a look at, you'll hear how DIRECTV successfully transformed their data warehouse investments into a competitive advantage with Oracle GoldenGate.

Read the article
How to Achieve Real-Time Data Protection and Availabilty....For Real

- by JoeMeeks

There is a class of business and mission critical applications where downtime or data loss have substantial negative impact on revenue, customer service, reputation, cost, etc. Because the Oracle Database is used extensively to provide reliable performance and availability for this class of application, it also provides an integrated set of capabilities for real-time data protection and availability. Active Data Guard, depicted in the figure below, is the cornerstone for accomplishing these objectives because it provides the absolute best real-time data protection and availability for the Oracle Database. This is a bold statement, but it is supported by the facts. It isn’t so much that alternative solutions are bad, it’s just that their architectures prevent them from achieving the same levels of data protection, availability, simplicity, and asset utilization provided by Active Data Guard. Let’s explore further. Backups are the most popular method used to protect data and are an essential best practice for every database. Not surprisingly, Oracle Recovery Manager (RMAN) is one of the most commonly used features of the Oracle Database. But comparing Active Data Guard to backups is like comparing apples to motorcycles. Active Data Guard uses a hot (open read-only), synchronized copy of the production database to provide real-time data protection and HA. In contrast, a restore from backup takes time and often has many moving parts - people, processes, software and systems – that can create a level of uncertainty during an outage that critical applications can’t afford. This is why backups play a secondary role for your most critical databases by complementing real-time solutions that can provide both data protection and availability. Before Data Guard, enterprises used storage remote-mirroring for real-time data protection and availability. Remote-mirroring is a sophisticated storage technology promoted as a generic infrastructure solution that makes a simple promise – whatever is written to a primary volume will also be written to the mirrored volume at a remote site. Keeping this promise is also what causes data loss and downtime when the data written to primary volumes is corrupt – the same corruption is faithfully mirrored to the remote volume making both copies unusable. This happens because remote-mirroring is a generic process. It has no intrinsic knowledge of Oracle data structures to enable advanced protection, nor can it perform independent Oracle validation BEFORE changes are applied to the remote copy. There is also nothing to prevent human error (e.g. a storage admin accidentally deleting critical files) from also impacting the remote mirrored copy. Remote-mirroring tricks users by creating a false impression that there are two separate copies of the Oracle Database. In truth; while remote-mirroring maintains two copies of the data on different volumes, both are part of a single closely coupled system. Not only will remote-mirroring propagate corruptions and administrative errors, but the changes applied to the mirrored volume are a result of the same Oracle code path that applied the change to the source volume. There is no isolation, either from a storage mirroring perspective or from an Oracle software perspective. Bottom line, storage remote-mirroring lacks both the smarts and isolation level necessary to provide true data protection. Active Data Guard offers much more than storage remote-mirroring when your objective is protecting your enterprise from downtime and data loss. Like remote-mirroring, an Active Data Guard replica is an exact block for block copy of the primary. Unlike remote-mirroring, an Active Data Guard replica is NOT a tightly coupled copy of the source volumes - it is a completely independent Oracle Database. Active Data Guard’s inherent knowledge of Oracle data block and redo structures enables a separate Oracle Database using a different Oracle code path than the primary to use the full complement of Oracle data validation methods before changes are applied to the synchronized copy. These include: physical check sum, logical intra-block checking, lost write validation, and automatic block repair. The figure below illustrates the stark difference between the knowledge that remote-mirroring can discern from an Oracle data block and what Active Data Guard can discern. An Active Data Guard standby also provides a range of additional services enabled by the fact that it is a running Oracle Database - not just a mirrored copy of data files. An Active Data Guard standby database can be open read-only while it is synchronizing with the primary. This enables read-only workloads to be offloaded from the primary system and run on the active standby - boosting performance by utilizing all assets. An Active Data Guard standby can also be used to implement many types of system and database maintenance in rolling fashion. Maintenance and upgrades are first implemented on the standby while production runs unaffected at the primary. After the primary and standby are synchronized and all changes have been validated, the production workload is quickly switched to the standby. The only downtime is the time required for user connections to transfer from one system to the next. These capabilities further expand the expectations of availability offered by a data protection solution beyond what is possible to do using storage remote-mirroring. So don’t be fooled by appearances. Storage remote-mirroring and Active Data Guard replication may look similar on the surface - but the devil is in the details. Only Active Data Guard has the smarts, the isolation, and the simplicity, to provide the best data protection and availability for the Oracle Database. Stay tuned for future blog posts that dive into the many differences between storage remote-mirroring and Active Data Guard along the dimensions of data protection, data availability, cost, asset utilization and return on investment. For additional information on Active Data Guard, see: Active Data Guard Technical White Paper Active Data Guard vs Storage Remote-Mirroring Active Data Guard Home Page on the Oracle Technology Network

Read the article
Are there sources of email marketing data available?

- by Gortron

Are sources of email marketing data available to the public? I would like to see email marketing data to see what kind of content a business sends out, the frequency of sending, the number of people emailed, especially the resulting open rates and click through rates. Are businesses willing to share data on their previous email marketing campaigns without divulging their contact list? I would like to use this data to create an application to help businesses create better newsletters by using this data as a benchmark, basically sharing what works and what doesn't for each industry.

Read the article
How to use OO for data analysis? [closed]

- by Konsta

In which ways could object-orientation (OO) make my data analysis more efficient and let me reuse more of my code? The data analysis can be broken up into get data (from db or csv or similar) transform data (filter, group/pivot, ...) display/plot (graph timeseries, create tables, etc.) I mostly use Python and its Pandas and Matplotlib packages for this besides some DB connectivity (SQL). Almost all of my code is a functional/procedural mix. While I have started to create a data object for a certain collection of time series, I wonder if there are OO design patterns/approaches for other parts of the process that might increase efficiency?

Read the article
Markup format or script for data files?

- by Aaron

The game I'm designing will be mainly written in a high level scripting language (leaning towards either Lua or Squirrel) with a C++ core. In addition to scripts I'm also going to need different data files. Many data files will be for static information such as graphical assets and monster types. I'd also want to create and update data files at runtime for user information like option settings and game saves. Can I get away with using plain script files (i.e. .lua or .nut files) for my data files, or is it better to use dedicated markup formats like XML or YAML? If I use script files, loaded separately from my true scripts, then I wouldn't need an extra library to read those files. Scripting languages like Lua also have table syntax that lend themselves towards data definition. On the other hand I'd have to write my own schema check code. These languages also don't seem to support serialization "out of the box" like the markup format libraries do.

Read the article
Winner of the 2012 Government Big Data Solutions Award

- by Jean-Pierre Dijcks

Hot off the press: The winner of the 2012 Government Big Data Solutions Aware is the National Cancer Institute!! Read all the details on CTOLabs.com. A short excerpt to wet your appetite: "... This solution, based on the Oracle Big Data Appliance with the Cloudera Distribution of Apache Hadoop (CDH), leverages capabilities available from the Big Data community today in pioneering ways that can serve a broad range of researchers. The promising approach of this solution is repeatable across many other Big Data challenges for bioinfomatics, making this approach worthy of its selection as the 2012 Government Big Data Solution Award." Read the entire post. Congrats to the entire team!!

Read the article
SQL Server and the XML Data Type : Data Manipulation

The introduction of the xml data type, with its own set of methods for processing xml data, made it possible for SQL Server developers to create columns and variables of the type xml. Deanna Dicken examines the modify() method, which provides for data manipulation of the XML data stored in the xml data type via XML DML statements. Too many SQL Servers to keep up with?Download a free trial of SQL Response to monitor your SQL Servers in just one intuitive interface."The monitoringin SQL Response is excellent." Mike Towery.

Read the article
Getting data from a webpage in a stable and efficient way

- by Mike Heremans

Recently I've learned that using a regex to parse the HTML of a website to get the data you need isn't the best course of action. So my question is simple: What then, is the best / most efficient and a generally stable way to get this data? I should note that: There are no API's There is no other source where I can get the data from (no databases, feeds and such) There is no access to the source files. (Data from public websites) Let's say the data is normal text, displayed in a table in a html page I'm currently using python for my project but a language independent solution/tips would be nice. As a side question: How would you go about it when the webpage is constructed by Ajax calls?

Read the article
What data structure to use / data persistence

- by Dave

I have an app where I need one table of information with the following fields: field 1 - int or char field 2 - string (max 10 char) field 3 - string (max 20 char) field 4 - float I need the program to filter on field 1 based upon a segmented control and select a field 2 from a picker. From this data I need to look up field 4 to use in a calculation. Total records will be about 200. I never see it go above 400 - 500. I am going to use a singleton which I am able to do, I just need help with the structure for this with data persistence. What type of data structure should I use for this and should I use NSNumber, NSString, etc. or old data types like float, Char, etc. I thought about a struct put into an array but there is probably a better way. This is new to me so any help or reference to examples would be great. I also thought about a plist or dictionary but it looks like it is just a lookup and a field which obviously won't work. Core data looked like overkill to me. Also, with any recommendation how should I get initial data into it? I want the user to be able to edit and add to the database. Sorry for the old terms, you can see what generation I am from... Thanks in advance!!!!

Read the article
Weird SSIS Configuration Error

- by Christopher House

I ran into an interesting SSIS issue that I thought I'd share in hopes that it may save someone from bruising their head after repeatedly banging it on the desk like I did. I was trying to setup what I believe is referred to as "indirect configuration" in SSIS. This is where you store your configuration in some repository like a database or a file, then store the location of that repository in an environment variable and use that to configure the connection to your configuration repository. In my specific situation, I was using a SQL database. I had this all working, but for reasons I'll not bore you with, I had to move my SSIS development to a new VM last week. When I got my new VM, I set about creating a new package. I finished up development on the package and started setting up configuration. I created an OLE DB connection that pointed to my configuration table then went through the configuration wizard to have the connection string for this connection set through my environment variable. I then went through the wizard to set another property through a value stored in the configuration table. When I got to the point where you select the connection, my connection wasn't in the list: As you can see in the screen capture above, the ConfigurationDb connection isn't in the list of available SQL connections in the configuration wizard. Strange. I canceled out of the wizard, went to the properties for ConfigurationDb, tested the connection and it was successful. I went back to the wizard again and this time ConfigurationDb was there. I completed the wizard then went to test my package. Unfortunately the package wouldn't run, I got the following error: Unfortunately, googling for this error code didn't help much as none of the results appears related to package configuration. I did notice that when I went back through the package configuration and tried to edit a previously saved config entry, I was getting the following error: I checked the connection string I had stored in my environment variable and noticed that indeed, it did not have a provider name. I didn't recall having included one on my previous VM, but I figured I'd include it just to see what happened. That made no difference at all. After a day and a half of trying to figure out what the problem was, I'm pleased to report that through extensive trial and error, I have resolved the error. As it turns out, the person who setup this new VM for me named the server SQLSERVER2008. This meant my configuration connection string was: Initial Catalog=SSISConfigDb;Data Source=SQLSERVER2008;Integrated Security=SSPI; Just for the heck of it, I tried changing it to: Initial Catalog=SSISConfigDb;Data Source=(local);Integrated Security=SSPI; That did the trick! As soon as I restarted BIDS, I was able to run the package with no errors at all. Crazy. So, the moral of the story is, don't name your server SQLSERVER2008 if you want SSIS configuration to work when using SQL as your config store.

Read the article
SSIS Technique to Remove/Skip Trailer and/or Bad Data Row in a Flat File

- by Compudicted

I noticed that the question on how to skip or bypass a trailer record or a badly formatted/empty row in a SSIS package keeps coming back on the MSDN SSIS Forum. I tried to figure out the reason why and after an extensive search inside the forum and outside it on the entire Web (using several search engines) I indeed found that it seems even thought there is a number of posts and articles on the topic none of them are employing the simplest and the most efficient technique. When I say efficient I mean the shortest time to solution for the fellow developers. OK, enough talk. Let’s face the problem: Typically a flat file (e.g. a comma delimited/CSV) needs to be processed (loaded into a database in most cases really). Oftentimes, such an input file is produced by some sort of an out of control, 3-rd party solution and would come in with some garbage characters and/or even malformed/miss-formatted rows. One such example could be this imaginary file: As you can see several rows have no data and there is an occasional garbage character (1, in this example on row #7). Our task is to produce a clean file that will only capture the meaningful data rows. As an aside, our output/target may be a database table, but for the purpose of this exercise we will simply re-format the source. Let’s outline our course of action to start off: Will use SSIS 2005 to create a DFT; The DFT will use a Flat File Source to our input [bad] flat file; We will use a Conditional Split to process the bad input file; and finally Dump the resulting data to a new [clean] file. Well, only four steps, let’s see if it is too much of work. 1: Start the BIDS and add a DFT to the Control Flow designer (I named it Process Dirty File DFT): 2, and 3: I had added the data viewer to just see what I am getting, alas, surprisingly the data issues were not seen it: What really is the key in the approach it is to properly set the Conditional Split Transformation. Visually it is: and specifically its SSIS Expression LEN([After CS Column 0]) > 1 The point is to employ the right Boolean expression (yes, the Conditional Split accepts only Boolean conditions). For the sake of this post I re-named the Output Name “No Empty Rows”, but by default it will be named Case 1 (remember to drag your first column into the expression area)! You can close your Conditional Split now. The next part will be crucial – consuming the output of our Conditional Split. Last step - #4: Add a Flat File Destination or any other one you need. Click on the Conditional Split and choose the green arrow to drop onto the target. When you do so make sure you choose the No Empty Rows output and NOT the Conditional Split Default Output. Make the necessary mappings. At this point your package must look like: As the last step will run our package to examine the produced output file. F5: and… it looks great!

Read the article
How to export SSIS to Microsoft Excel without additional software?

- by Dr. Zim

This question is long winded because I have been updating the question over a very long time trying to get SSIS to properly export Excel data. I managed to solve this issue, although not correctly. Aside from someone providing a correct answer, the solution listed in this question is not terrible. The only answer I found was to create a single row named range wide enough for my columns. In the named range put sample data and hide it. SSIS appends the data and reads metadata from the single row (that is close enough for it to drop stuff in it). The data takes the format of the hidden single row. This allows headers, etc. WOW what a pain in the butt. It will take over 450 days of exports to recover the time lost. However, I still love SSIS and will continue to use it because it is still way better than Filemaker LOL. My next attempt will be doing the same thing in the report server. Original question notes: If you are in Sql Server Integrations Services designer and want to export data to an Excel file starting on something other than the first line, lets say the forth line, how do you specify this? I tried going in to the Excel Destination of the Data Flow, changed the AccessMode to OpenRowSet from Variable, then set the variable to "YPlatters$A4:I20000" This fails saying it cannot find the sheet. The sheet is called YPlatters. I thought you could specify (Sheet$)(Starting Cell):(Ending Cell)? Update Apparently in Excel you can select a set of cells and name them with the name box. This allows you to select the name instead of the sheet without the $ dollar sign. Oddly enough, whatever the range you specify, it appends the data to the next row after the range. Oddly, as you add data, it increases the named selection's row count. Another odd thing is the data takes the format of the last line of the range specified. My header rows are bold. If I specify a range that ends with the header row, the data appends to the row below, and makes all the entries bold. if you specify one row lower, it puts a blank line between the header row and the data, but the data is not bold. Another update No matter what I try, SSIS samples the "first row" of the file and sets the metadata according to what it finds. However, if you have sample data that has a value of zero but is formatted as the first row, it treats that column as text and inserts numeric values with a single quote in front ('123.34). I also tried headers that do not reflect the data types of the columns. I tried changing the metadata of the Excel destination, but it always changes it back when I run the project, then fails saying it will truncate data. If I tell it to ignore errors, it imports everything except that column. Several days of several hours a piece later... Another update I tried every combination. A mostly working example is to create the named range starting with the column headers. Format your column headers as you want the data to look as the data takes on this format. In my example, these exist from A4 to E4, which is my defined range. SSIS appends to the row after the defined range, so defining A4 to E68 appends the rows starting at A69. You define the Connection as having the first row contains the field names. It takes on the metadata of the header row, oddly, not the second row, and it guesses at the data type, not the formatted data type of the column, i.e., headers are text, so all my metadata is text. If your headers are bold, so is all of your data. I even tried making a sample data row without success... I don't think anyone actually uses Excel with the default MS SSIS export. If you could define the "insert range" (A5 to E5) with no header row and format those columns (currency, not bold, etc.) without it skipping a row in Excel, this would be very helpful. From what I gather, noone uses SSIS to export Excel without a third party connection manager. Any ideas on how to set this up properly so that data is formatted correctly, i.e., the metadata read from Excel is proper to the real data, and formatting inherits from the first row of data, not the headers in Excel? One last update (July 17, 2009) I got this to work very well. One thing I added to Excel was the IMEX=1 in the Excel connection string: "Excel 8.0;HDR=Yes;IMEX=1". This forces Excel (I think) to look at all rows to see what kind of data is in it. Generally, this does not drop information, say for instance if you have a zip code then about 9 rows down you have a zip+4, Excel without this blanks that field entirely without error. With IMEX=1, it recognizes that Zip is actually a character field instead of numeric. And of course, one more update (August 27, 2009) The IMEX=1 will succeed importing data with missing contents in the first 8 rows, but it will fail exporting data where no data exists. So, have it on your import connection string, but not your export Excel connection string. I have to say, after so much fiddling, it works pretty well.

Read the article
Using a "white list" for extracting terms for Text Mining

- by [email protected]

In Part 1 of my post on "Generating cluster names from a document clustering model" (part 1, part 2, part 3), I showed how to build a clustering model from text documents using Oracle Data Miner, which automates preparing data for text mining. In this process we specified a custom stoplist and lexer and relied on Oracle Text to identify important terms. However, there is an alternative approach, the white list, which uses a thesaurus object with the Oracle Text CTXRULE index to allow you to specify the important terms. INTRODUCTIONA stoplist is used to exclude, i.e., black list, specific words in your documents from being indexed. For example, words like a, if, and, or, and but normally add no value when text mining. Other words can also be excluded if they do not help to differentiate documents, e.g., the word Oracle is ubiquitous in the Oracle product literature. One problem with stoplists is determining which words to specify. This usually requires inspecting the terms that are extracted, manually identifying which ones you don't want, and then re-indexing the documents to determine if you missed any. Since a corpus of documents could contain thousands of words, this could be a tedious exercise. Moreover, since every word is considered as an individual token, a term excluded in one context may be needed to help identify a term in another context. For example, in our Oracle product literature example, the words "Oracle Data Mining" taken individually are not particular helpful. The term "Oracle" may be found in nearly all documents, as with the term "Data." The term "Mining" is more unique, but could also refer to the Mining industry. If we exclude "Oracle" and "Data" by specifying them in the stoplist, we lose valuable information. But it we include them, they may introduce too much noise. Still, when you have a broad vocabulary or don't have a list of specific terms of interest, you rely on the text engine to identify important terms, often by computing the term frequency - inverse document frequency metric. (This is effectively a weight associated with each term indicating its relative importance in a document within a collection of documents. We'll revisit this later.) The results using this technique is often quite valuable. As noted above, an alternative to the subtractive nature of the stoplist is to specify a white list, or a list of terms--perhaps multi-word--that we want to extract and use for data mining. The obvious downside to this approach is the need to specify the set of terms of interest. However, this may not be as daunting a task as it seems. For example, in a given domain (Oracle product literature), there is often a recognized glossary, or a list of keywords and phrases (Oracle product names, industry names, product categories, etc.). Being able to identify multi-word terms, e.g., "Oracle Data Mining" or "Customer Relationship Management" as a single token can greatly increase the quality of the data mining results. The remainder of this post and subsequent posts will focus on how to produce a dataset that contains white list terms, suitable for mining. CREATING A WHITE LIST We'll leverage the thesaurus capability of Oracle Text. Using a thesaurus, we create a set of rules that are in effect our mapping from single and multi-word terms to the tokens used to represent those terms. For example, "Oracle Data Mining" becomes "ORACLEDATAMINING." First, we'll create and populate a mapping table called my_term_token_map. All text has been converted to upper case and values in the TERM column are intended to be mapped to the token in the TOKEN column. TERM TOKEN DATA MINING DATAMINING ORACLE DATA MINING ORACLEDATAMINING 11G ORACLE11G JAVA JAVA CRM CRM CUSTOMER RELATIONSHIP MANAGEMENT CRM ... Next, we'll create a thesaurus object my_thesaurus and a rules table my_thesaurus_rules: CTX_THES.CREATE_THESAURUS('my_thesaurus', FALSE); CREATE TABLE my_thesaurus_rules (main_term VARCHAR2(100), query_string VARCHAR2(400)); We next populate the thesaurus object and rules table using the term token map. A cursor is defined over my_term_token_map. As we iterate over the rows, we insert a synonym relationship 'SYN' into the thesaurus. We also insert into the table my_thesaurus_rules the main term, and the corresponding query string, which specifies synonyms for the token in the thesaurus. DECLARE cursor c2 is select token, term from my_term_token_map; BEGIN for r_c2 in c2 loop CTX_THES.CREATE_RELATION('my_thesaurus',r_c2.token,'SYN',r_c2.term); EXECUTE IMMEDIATE 'insert into my_thesaurus_rules values (:1,''SYN(' || r_c2.token || ', my_thesaurus)'')' using r_c2.token; end loop; END; We are effectively inserting the token to return and the corresponding query that will look up synonyms in our thesaurus into the my_thesaurus_rules table, for example: 'ORACLEDATAMINING' SYN ('ORACLEDATAMINING', my_thesaurus)At this point, we create a CTXRULE index on the my_thesaurus_rules table: create index my_thesaurus_rules_idx on my_thesaurus_rules(query_string) indextype is ctxsys.ctxrule; In my next post, this index will be used to extract the tokens that match each of the rules specified. We'll then compute the tf-idf weights for each of the terms and create a nested table suitable for mining.

Read the article
Using Stored Procedures in SSIS

- by dataintegration

The SSIS Data Flow components: the source task and the destination task are the easiest way to transfer data in SSIS. Some data transactions do not fit this model, they are procedural tasks modeled as stored procedures. In this article we show how you can call stored procedures available in RSSBus ADO.NET Providers from SSIS. In this article we will use the CreateJob and the CreateBatch stored procedures available in RSSBus ADO.NET Provider for Salesforce, but the same steps can be used to call a stored procedure in any of our data providers. Step 1: Open Visual Studio and create a new Integration Services Project. Step 2: Add a new Data Flow Task to the Control Flow window. Step 3: Open the Data Flow Task and add a Script Component to the data flow pane. A dialog box will pop-up allowing you to select the Script Component Type: pick the source type as we will be outputting columns from our stored procedure. Step 4: Double click the Script Component to open the editor. Step 5: In the "Inputs and Outputs" settings, enter all the columns you want to output to the data flow. Ensure the correct data type has been set for each output. You can check the data type by selecting the output and then changing the "DataType" property from the property editor. In our example, we'll add the column JobID of type String. Step 6: Select the "Script" option in the left-hand pane and click the "Edit Script" button. This will open a new Visual Studio window with some boiler plate code in it. Step 7: In the CreateOutputRows() function you can add code that executes the stored procedures included with the Salesforce Component. In this example we will be using the CreateJob and CreateBatch stored procedures. You can find a list of the available stored procedures along with their inputs and outputs in the product help. //Configure the connection string to your credentials String connectionString = "Offline=False;user=myusername;password=mypassword;access token=mytoken;"; using (SalesforceConnection conn = new SalesforceConnection(connectionString)) { //Create the command to call the stored procedure CreateJob SalesforceCommand cmd = new SalesforceCommand("CreateJob", conn); cmd.CommandType = CommandType.StoredProcedure; cmd.Parameters.Add(new SalesforceParameter("ObjectName", "Contact")); cmd.Parameters.Add(new SalesforceParameter("Action", "insert")); //Execute CreateJob //CreateBatch requires JobID as input so we store this value for later SalesforceDataReader rdr = cmd.ExecuteReader(); String JobID = ""; while (rdr.Read()) { JobID = (String)rdr["JobID"]; } //Create the command for CreateBatch, for this example we are adding two new rows SalesforceCommand batCmd = new SalesforceCommand("CreateBatch", conn); batCmd.CommandType = CommandType.StoredProcedure; batCmd.Parameters.Add(new SalesforceParameter("JobID", JobID)); batCmd.Parameters.Add(new SalesforceParameter("Aggregate", "<Contact><Row><FirstName>Bill</FirstName>" + "<LastName>White</LastName></Row><Row><FirstName>Bob</FirstName><LastName>Black</LastName></Row></Contact>")); //Execute CreateBatch SalesforceDataReader batRdr = batCmd.ExecuteReader(); } Step 7b: If you had specified output columns earlier, you can now add data into them using the UserComponent Output0Buffer. For example, we had set an output column called JobID of type String so now we can set a value for it. We will modify the DataReader that contains the output of CreateJob like so:. while (rdr.Read()) { Output0Buffer.AddRow(); JobID = (String)rdr["JobID"]; Output0Buffer.JobID = JobID; } Step 8: Note: You will need to modify the connection string to include your credentials. Also ensure that the System.Data.RSSBus.Salesforce assembly is referenced and include the following using statements to the top of the class: using System.Data; using System.Data.RSSBus.Salesforce; Step 9: Once you are done editing your script, save it, and close the window. Click OK in the Script Transformation window to go back to the main pane. Step 10: If had any outputs from the Script Component you can use them in your data flow. For example we will use a Flat File Destination. Configure the Flat File Destination to output the results to a file, and you should see the JobId in the file. Step 11: Your project should be ready to run.

Read the article
ADO.NET (WCF) Data Services Query Interceptor Hangs IIS

- by PreMagination

I have an ADO.NET Data Service that's supposed to provide read-only access to a somewhat complex database. Logically I have table-per-type (TPT) inheritance in my data model but the EDM doesn't implement inheritance. (Limitation of EF and navigation properties on derived types. STILL not fixed in EF4!) I can query my EDM directly (using a separate project) using a copy of the query I'm trying to run against the web service, results are returned within 10 seconds. Disabling the query interceptors I'm able to make the same query against the web service, results are returned similarly quickly. I can enable some of the query interceptors and the results are returned slowly, up to a minute or so later. Alternatively, I can enable all the query interceptors, expand less of the properties on the main object I'm querying, and results are returned in a similar period of time. (I've increased some of the timeout periods) Up til this point Sql Profiler indicates the slow-down is the database. (That's a post for a different day) But when I enable all my query interceptors and expand all the properties I'd like to have the IIS worker process pegs the CPU for 20 minutes and a query is never even made against the database. This implies to me that yes, my implementation probably sucks but regardless the Data Services "tier" is having an issue it shouldn't. WCF tracing didn't reveal anything interesting to my untrained eye. Details: Data model: Agent-Person-Student Student has a collection of referrals Students and referrals are private, queries against the web service should only return "your" students and referrals. This means Person and Agent need to be filtered too. Other entities (Agent-Organization-School) can be accessed by anyone who has authenticated. The existing security model is poorly suited to perform this type of filtering for this type of data access, the query interceptors are complicated and cause EF to generate some entertaining sql queries. Sample Interceptor [QueryInterceptor("Agents")] public Expression<Func<Agent, Boolean>> OnQueryAgents() { //Agent is a Person(1), Educator(2), Student(3), or Other Person(13); allow if scope permissions exist return ag => (ag.AgentType.AgentTypeId == 1 || ag.AgentType.AgentTypeId == 2 || ag.AgentType.AgentTypeId == 3 || ag.AgentType.AgentTypeId == 13) && ag.Person.OrganizationPersons.Count<OrganizationPerson>(op => op.Organization.ScopePermissions.Any<ScopePermission> (p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124) || op.Organization.HierarchyDescendents.Any<OrganizationsHierarchy>(oh => oh.AncestorOrganization.ScopePermissions.Any<ScopePermission> (p => p.ApplicationRoleAccount.Account.UserName == HttpContext.Current.User.Identity.Name && p.ApplicationRoleAccount.Application.ApplicationId == 124))) > 0; } The query interceptors for Person, Student, Referral are all very similar, ie they traverse multiple same/similar tables to look for ScopePermissions as above. Sample Query var referrals = (from r in service.Referrals .Expand("Organization/ParentOrganization") .Expand("Educator/Person/Agent") .Expand("Student/Person/Agent") .Expand("Student") .Expand("Grade") .Expand("ProblemBehavior") .Expand("Location") .Expand("Motivation") .Expand("AdminDecision") .Expand("OthersInvolved") where r.DateCreated >= coupledays && r.DateDeleted == null select r); Any suggestions or tips would be greatly associated, for fixing my current implementation or in developing a new one, with the caveat that the database can't be changed and that ultimately I need to expose a large portion of the database via a web service that limits data access to the data authorized for, for the purpose of data integration with multiple outside parties. THANK YOU!!!

Read the article
make-like build tools for data?

- by miku

Make is a standard tools for building software. But make decides whether a target needs to be regenerated by comparing file modification times. Are there any proven, preferably small tools that handle builds not for software but for data? Something that regenerates targets not only on mod times but on certain other properties (e.g. completeness). (Or alternatively some paper that describes such a tool.) As illustration: I'd like to automate the following process: get data (e.g. a tarball) from some regularly updated source copy somewhere if it's not there (based e.g. on some filename-scheme) convert the files to different format (but only if there aren't successfully converted ones there - e.g. from a previous attempt - custom comparison routine) for each file find a certain data element and fetch some additional file from say an URL, but only if that hasn't been downloaded yet (decide on existence of file and file "freshness") finally compute something (e.g. word count for something identifiable and store it in the database, but only if the DB does not have an entry for that exact ID yet) Observations: there are different stages each stage is usually simple to compute or implement in isolation each stage may be simple, but the data volume may be large each stage may produce a few errors each stage may have different signals, on when (re)processing is needed Requirements: builds should be interruptable and idempotent (== robust) when interrupted, already processed objects should be reused to speedup the next run data paths should be easy to adjust (simple syntax, nothing new to learn, internal dsl would be ok) some form of dependency graph, that describes the process would be nice for later visualizations should leverage existing programs, if possible I've done some research on make alternatives like rake and have worked a lot with ant and maven in the past. All these tools naturally focus on code and software build, not on data builds. A system we have in place now for a task similar to the above is pretty much just shell scripts, which are compact (and are a ok glue for a variety of other programs written in other languages), so I wonder if worse is better?

Read the article
SQL SERVER – Installing Data Quality Services (DQS) on SQL Server 2012

- by pinaldave

Data Quality Services is very interesting enhancements in SQL Server 2012. My friend and SQL Server Expert Govind Kanshi have written an excellent article on this subject earlier on his blog. Yesterday I stumbled upon his blog one more time and decided to experiment myself with DQS. I have basic understanding of DQS and MDS so I knew I need to start with DQS Client. However, when I tried to find DQS Client I was not able to find it under SQL Server 2012 installation. I quickly realized that I needed to separately install the DQS client. You will find the DQS installer under SQL Server 2012 >> Data Quality Services directory. The pre-requisite of DQS is Master Data Services (MDS) and IIS. If you have not installed IIS, you can follow the simple steps and install IIS in your machine. Once the pre-requisites are installed, click on MDS installer once again and it will install DQS just fine. Be patient with the installer as it can take a bit longer time if your machine is low on configurations. Once the installation is over you will be able to expand SQL Server 2012 >> Data Quality Services directory and you will notice that it will have a new item called Data Quality Client. Click on it and it will open the client. Well, in future blog post we will go over more details about DQS and detailed practical examples. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology Tagged: Data Quality Services

Read the article

Search Results

Search found 58965 results on 2359 pages for 'ssis data transformations'.

Page 21/2359 | < Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28 | Next Page >

- by Bokhari

- by paranjai

- by Nev_Rahd

- by Sreejesh Kumar

- by cibrax

- by user373312

- by Mike.Hallett(at)Oracle-BI&EPM

- by caroline.yu

- by dain.hansen

- by JoeMeeks

- by Gortron

- by Konsta

- by Aaron

- by Jean-Pierre Dijcks

- by Mike Heremans

- by Dave

- by Christopher House

- by Compudicted

- by Dr. Zim

- by [email protected]

- by dataintegration

- by PreMagination

- by miku

- by pinaldave

< Previous Page | 17 18 19 20 21 22 23 24 25 26 27 28 | Next Page >