etl - Page 3 - Developer IT

SSIS (missing) Pre-Build and Post-Build

- by Raj More

For the warehouse work under progress, we have a single solution with multiple projects in it OLTP Database Project Warehouse Database Project SSIS ETL project After the SSIS project is built, I want to move the binaries (XML, really) from the Bin folder to "C:\AutomatedTasks\ETL.Warehouse\" and "C:\AutomatedTasks\ETL" I cannot find the Post-Build events to do that for the SSIS project. Where are they? If they aren't available, how do I achieve this?

Read the article

Combination of Operating Mode and Commit Strategy

- by Kevin Yang

If you want to populate a source into multiple targets, you may also want to ensure that every row from the source affects all targets uniformly (or separately). Let’s consider the Example Mapping below. If a row from SOURCE causes different changes in multiple targets (TARGET_1, TARGET_2 and TARGET_3), for example, it can be successfully inserted into TARGET_1 and TARGET_3, but failed to be inserted into TARGET_2, and the current Mapping Property TLO (target load order) is “TARGET_1 -> TARGET_2 -> TARGET_3”. What should Oracle Warehouse Builder do, in order to commit the appropriate data to all affected targets at the same time? If it doesn’t behave as you intended, the data could become inaccurate and possibly unusable. Example Mapping In OWB, we can use Mapping Configuration Commit Strategies and Operating Modes together to achieve this kind of requirements. Below we will explore the combination of these two features and how they affect the results in the target tables Before going to the example, let’s review some of the terms we will be using (Details can be found in white paper Oracle® Warehouse Builder Data Modeling, ETL, and Data Quality Guide11g Release 2): Operating Modes: Set-Based Mode: Warehouse Builder generates a single SQL statement that processes all data and performs all operations. Row-Based Mode: Warehouse Builder generates statements that process data row by row. The select statement is in a SQL cursor. All subsequent statements are PL/SQL. Row-Based (Target Only) Mode: Warehouse Builder generates a cursor select statement and attempts to include as many operations as possible in the cursor. For each target, Warehouse Builder inserts each row into the target separately. Commit Strategies: Automatic: Warehouse Builder loads and then automatically commits data based on the mapping design. If the mapping has multiple targets, Warehouse Builder commits and rolls back each target separately and independently of other targets. Use the automatic commit when the consequences of multiple targets being loaded unequally are not great or are irrelevant. Automatic correlated: It is a specialized type of automatic commit that applies to PL/SQL mappings with multiple targets only. Warehouse Builder considers all targets collectively and commits or rolls back data uniformly across all targets. Use the correlated commit when it is important to ensure that every row in the source affects all affected targets uniformly. Manual: select manual commit control for PL/SQL mappings when you want to interject complex business logic, perform validations, or run other mappings before committing data. Combination of the commit strategy and operating mode To understand the effects of each combination of operating mode and commit strategy, I’ll illustrate using the following example Mapping. Firstly we insert 100 rows into the SOURCE table and make sure that the 99th row and 100th row have the same ID value. And then we create a unique key constraint on ID column for TARGET_2 table. So while running the example mapping, OWB tries to load all 100 rows to each of the targets. But the mapping should fail to load the 100th row to TARGET_2, because it will violate the unique key constraint of table TARGET_2. With different combinations of Commit Strategy and Operating Mode, here are the results ¦ Set-based/ Correlated Commit: Configuration of Example mapping: Result: What’s happening: A single error anywhere in the mapping triggers the rollback of all data. OWB encounters the error inserting into Target_2, it reports an error for the table and does not load the row. OWB rolls back all the rows inserted into Target_1 and does not attempt to load rows to Target_3. No rows are added to any of the target tables. ¦ Row-based/ Correlated Commit: Configuration of Example mapping: Result: What’s happening: OWB evaluates each row separately and loads it to all three targets. Loading continues in this way until OWB encounters an error loading row 100th to Target_2. OWB reports the error and does not load the row. It rolls back the row 100th previously inserted into Target_1 and does not attempt to load row 100 to Target_3. Then, if there are remaining rows, OWB will continue loading them, resuming with loading rows to Target_1. The mapping completes with 99 rows inserted into each target. ¦ Set-based/ Automatic Commit: Configuration of Example mapping: Result: What’s happening: When OWB encounters the error inserting into Target_2, it does not load any rows and reports an error for the table. It does, however, continue to insert rows into Target_3 and does not roll back the rows previously inserted into Target_1. The mapping completes with one error message for Target_2, no rows inserted into Target_2, and 100 rows inserted into Target_1 and Target_3 separately. ¦ Row-based/Automatic Commit: Configuration of Example mapping: Result: What’s happening: OWB evaluates each row separately for loading into the targets. Loading continues in this way until OWB encounters an error loading row 100 to Target_2 and reports the error. OWB does not roll back row 100th from Target_1, does insert it into Target_3. If there are remaining rows, it will continue to load them. The mapping completes with 99 rows inserted into Target_2 and 100 rows inserted into each of the other targets. Note: Automatic Correlated commit is not applicable for row-based (target only). If you design a mapping with the row-based (target only) and correlated commit combination, OWB runs the mapping but does not perform the correlated commit. In set-based mode, correlated commit may impact the size of your rollback segments. Space for rollback segments may be a concern when you merge data (insert/update or update/insert). Correlated commit operates transparently with PL/SQL bulk processing code. The correlated commit strategy is not available for mappings run in any mode that are configured for Partition Exchange Loading or that include a Queue, Match Merge, or Table Function operator. If you want to practice in your own environment, you can follow the steps: 1. Import the MDL file: commit_operating_mode.mdl 2. Fix the location for oracle module ORCL and deploy all tables under it. 3. Insert sample records into SOURCE table, using below plsql code: begin for i in 1..99 loop insert into source values(i, 'col_'||i); end loop; insert into source values(99, 'col_99'); end; 4. Configure MAPPING_1 to any combinations of operating mode and commit strategy you want to test. And make sure feature TLO of mapping is open. 5. Deploy Mapping “MAPPING_1”. 6. Run the mapping and check the result.

Read the article

ODI and OBIEE 11g Integration

- by David Allan

Here we will see some of the connectivity options to OBIEE 11g using the JDBC driver. You’ll see based upon some connection properties how the physical or presentation layers can be utilized. In the integrators guide for OBIEE 11g you will find a brief statement indicating that there actually is a JDBC driver for OBIEE. In OBIEE 11g its now possible to connect directly to the physical layer, Venkat has an informative post here on this topic. In ODI 11g the Oracle BI technology is shipped with the product along with KMs for reverse engineering, and using OBIEE models for a data source. When you install OBIEE in 11g a light weight demonstration application is preinstalled in the server, when you open this in the BI Administration tool we see the regular 3 panel view within the administration tool. To interrogate this system via JDBC (just like ODI does using the KMs) need a couple of things; the JDBC driver from OBIEE 11g, a java client program and the credentials. In my java client program I want to connect to the OBIEE system, when I connect I can interrogate what the JDBC driver presents for the metadata. The metadata projected via the JDBC connection’s DatabaseMetadata changes depending on whether the property NQ_SESSION.SELECTPHYSICAL is set when the java client connects. Let’s use the sample app to illustrate. I have a java client program here that will print out the tables in the DatabaseMetadata, it will also output the catalog and schema. For example if I execute without any special JDBC properties as follows; java -classpath .;%BIHOMEDIR%\clients\bijdbc.jar meta_jdbc oracle.bi.jdbc.AnaJdbcDriver jdbc:oraclebi://localhost:9703/ weblogic mypass Then I get the following returned representing the presentation layer, the sample I used is XML, and has no schema; Catalog Schema Table Sample Sales Lite null Base Facts Sample Sales Lite null Calculated Facts … Sample Targets Lite null Base Facts … Now if I execute with the only difference being the JDBC property NQ_SESSION.SELECTPHYSICAL with the value Yes, then I see a different set of values representing the physical layer in OBIEE; java -classpath .;%BIHOMEDIR%\clients\bijdbc.jar meta_jdbc oracle.bi.jdbc.AnaJdbcDriver jdbc:oraclebi://localhost:9703/ weblogic mypass NQ_SESSION.SELECTPHYSICAL=Yes The following is returned; Catalog Schema Table Sample App Lite Data null D01 Time Day Grain Sample App Lite Data null F10 Revenue Facts (Order grain) … System DB (Update me) … If this was a database system such as Oracle, the catalog value would be the OBIEE database name and the schema would be the Oracle database schema. Other systems which have real catalog structure such as SQLServer would use its catalog value. Its this ‘Catalog’ and ‘Schema’ value that is important when integration OBIEE with ODI. For the demonstration application in OBIEE 11g, the following illustration shows how the information from OBIEE is related via the JDBC driver through to ODI. In the XML example above, within ODI’s physical schema definition on the right, we leave the schema blank since the XML data source has no schema. When I did this at first, I left the default value that ODI places in the Schema field since which was ‘<Undefined>’ (like image below) but this string is actually used in the RKM so ended up not finding any tables in this schema! Entering an empty string resolved this. Below we see a regular Oracle database example that has the database, schema, physical table structure, and how this is defined in ODI. Remember back to the physical versus presentation layer usage when we passed the special property, well to do this in ODI, the data server has a panel for properties where you can define key/value pairs. So if you want to select physical objects from the OBIEE server, then you must set this property. An additional changed in ODI 11g is the OBIEE connection pool support, this has been implemented via a ‘Connection Pool’ flex field for the Oracle BI data server. So here you set the connection pool name from the OBIEE system that you specifically want to use and this is used by the Oracle BI to Oracle (DBLINK) LKM, so if you are using this you must set this flex field. Hopefully a useful insight into some of the mechanics of how this hangs together.

Read the article

SSIS Design Pattern: Loading Variable-Length Rows

- by andyleonard

Introduction I encounter flat file sources with variable-length rows on occassion. Here, I supply one SSIS Design Pattern for loading them. What's a Variable-Length Row Flat File? Great question - let's start with a definition. A variable-length row flat file is a text source of some flavor - comma-separated values (CSV), tab-delimited file (TDF), or even fixed-length, positional-, or ordinal-based (where the location of the data on the row defines its field). The major difference between a "normal"...(read more)

Read the article

OWB 11gR2 – Flexible and extensible

- by David Allan

The Oracle data integration extensibility capabilities are something I love, nothing more frustrating than a tool or platform that is very constraining. I think extensibility and flexibility are invaluable capabilities in the data integration arena. I liked Uli Bethke's posting on some extensibility capabilities with ODI (see Nesting ODI Substitution Method Calls here), he has some useful guidance on making customizations to existing KMs, nice to learn by example. I thought I'd illustrate the same capabilities with ODI's partner OWB for the OWB community. There is a whole new world of potential. The LKM/IKM/CKM/JKMs are the primary templates that are supported (plus the Oracle Target code template), so there is a lot of potential for customizing and extending the product in this release. Enough waffle... Diving in at the deep end from Uli's post, in OWB the table operator has a number of additional properties in OWB 11gR2 that let you annotate the column usage with ODI-like properties such as the slowly changing usage or for your own user-defined purpose as in Uli's post, below you see for the target table SALES_TARGET we can use the UD5 property which when assigned the code template (knowledge module) which has been modified with Uli's change we can do custom things such as creating indices - provides The code template used by the mapping has the additional step which is basically the code illustrated from Uli's posting just used directly, the ODI 10g substitution references also supported from within OWB's runtime. Now to see whether this does what we expect before we execute it, we can check out the generated code similar to how the traditional mapping generation and preview works, you do this by clicking on the 'Inspect Code' button on the execution units code template assignment. This then creates another tab with prefix 'Code - <mapping name>' where the generated code is put, scrolling down we find the last step with the indices being created, looks good, so we are ready to deploy and execute. After executing the mapping we can then use the 'Audit Information' panel (select the mapping in the designer tree and click on View/Audit Information), this gives us a view of the execution where we can drill into the tasks that were executed and inspect both the template and the generated code that was executed and any potential errors. Reflecting back on earlier versions of OWB, these were the kinds of features that were always highly desirable, getting under the hood of the code generation and tweaking bit and pieces - fun and powerful stuff! We can step it up a bit here and explore some further ideas. The example below is a daisy-chained set of execution units where the intermediate table is a target of one unit and the source for another. We want that table to be a global temporary table, so can tweak the templates. Back to the copy of SQL Control Append (for demo purposes) we modify the create target table step to make the table a global temporary table, with the option of on commit preserve rows. You can get a feel for some of the customizations and changes possible, providing some great flexibility and extensibility for the data integration tools.

Read the article

Debugging OWB generated SAP ABAP code executed through RFC

- by Anil Menon

Within OWB if you need to execute ABAP code using RFC you will have to use the SAP Function Module RFC_ABAP_INSTALL_AND_RUN. This function module is specified during the creation of the SAP source location. Usually in a Production environment a copy of this function module is used due to security restrictions. When you execute the mapping by using this Function Module you can’t see the actual ABAP code that is passed on to the SAP system. In case you want to take a look at the code that will be executed on the SAP system you need to use a custom Function Module in SAP. The easiest way to do this is to make a copy of the Function Module RFC_ABAP_INSTALL_AND_RUN and call it say Z_TEST_FM. Then edit the code of the Function Module in SAP as below FUNCTION Z_TEST_FM . DATA: BEGIN OF listobj OCCURS 20. INCLUDE STRUCTURE abaplist. DATA: END OF listobj. DATA: begin_of_line(72). DATA: line_end_char(1). DATA: line_length type I. DATA: lin(72). loop at program. append program-line to WRITES. endloop. ENDFUNCTION. Within OWB edit the SAP Location and use Z_TEST_FM as the “Execution Function Module” instead of RFC_ABAP_INSTALL_AND_RUN. Then register this location. The Mapping you want to debug will have to be deployed. After deployment you can right click the mapping and click on “Start”. After clicking start the “Input Parameters” screen will be displayed. You can make changes here if you need to. Check that the parameter BACKGROUND is set to “TRUE”. After Clicking “OK” the log for the execution will be displayed. The execution of Mappings will always fail when you use the above function module. Clicking on the icon “I” (information) the ABAP code will be displayed. The ABAP code displayed is the code that is passed through the Function Module. You can also find the code by going through the log files on the server which hosts the OWB repository. The logs will be located under <OWB_HOME>/owb/log. Patch #12951045 is recommended while using the SAP Connector with OWB 11.2.0.2. For recommended patches for other releases please check with Oracle Support at http://support.oracle.com

Read the article

OWB 11gR2 - Find and Search Metadata in Designer

- by David Allan

Here are some tools and techniques for finding objects, specifically in the design repository. There are ways of navigating and collating objects that are useful for day to day development and build-time usage - this includes features out of the box and utilities constructed on top. There are a variety of techniques to navigate and find objects in the repository, the first 3 are out of the box, the 4th is an expert utility. Navigating by the tree, grouping by project and module - ok if you are aware of the exact module/folder that objects reside in. The structure panel is a useful way of finding parts of an object, especially when large rather than using the canvas. In large scale projects it helps to have accelerators (either find or collections below). Advanced find to search by name - 11gR2 included a find capability specifically for large scale projects. There were improvements in both the tree search and the object editors (including highlighting in mapping for example). So you can now do regular expression based search and quickly navigate to objects within a repository. Collections - logically organize your objects into virtual folders by shortcutting the actual objects. This is useful for a range of things since all the OWB services operate on collections too (export/import, validation, deployment). See the post here for new collection functionality in 11gR2. Reports for searching by type, updated on, updated by etc. Useful for activities such as periodic incremental actions (deploy all mappings changed in the past week). The report style view is useful since I can quickly see who changed what and when. You can see all the audit details for objects within each objects property inspector, but its useful to just get all objects changed today or example, all objects changed since my last build etc. This utility combines both UI extensions via experts and the public views on the repository. In the figure to the right you see the contextual option 'Object Search' which invokes the utility, you can see I have quite a number of modules within my project. Figure out all the potential objects which have been changed is not simple. The utility is an expert which provides this kind of search capability. The utility provides a report of the objects in the design repository which satisfy some filter criteria. The type of criteria includes; objects updated in the last n days optionally filter the objects updated by user filter the user by project and by type (table/mappings etc.) The search dialog appears with these options, you can multi-select the object types, so for example you can select TABLE and MAPPING. Its also possible to search across projects if need be. If you have multiple users using the repository you can define the OWB user name in the 'Updated by' property to restrict the report to just that user also. Finally there is a search name that will be used for some of the options such as building a collection - this name is used for the collection to be built. In the example I have done, I've just searched my project for all process flows and mappings that users have updated in the last 7 days. The results of the query are returned in a table containing the object names, types, full path and audit details. The columns are sort-able, you can sort the results by name, type, path etc. One of the cool things here, is that you can then perform operations on these objects - such as edit them, export single selection or entire results to MDL, create a collection from the results (now you have a saved set of references in the repository, you could do deploy/export etc.), create a deployment script from the results...or even add in your own ideas! You see from this that you can do bulk operations on sets of objects based on search results. So for example selecting the 'Build Collection' option creates a collection with all of the objects from my search, you can subsequently deploy/generate/maintain this collection of objects. Under the hood of the expert if just basic OMB commands from the product and the use of the public views on the design repository. You can see how easy it is to build up macro-like capabilities that will help you do day-to-day as well as build like tasks on sets of objects.

Read the article

ODI 11g – Scripting Repository Creation

- by David Allan

Here’s a quick post on how to create both master and work repositories in one simple dialog, its using the groovy capabilities in ODI 11g and the groovy swing builder components. So if you want more/less take the groovy script and change, its easy stuff. The groovy script odi_create_repos.groovy is here, just open it in ODI before connecting and you will be able to create both master and work repositories with ease – or check the groovy out and script your own automation – you can construct the master, work and runtime repositories, so if you are embedding ODI as your DI engine this may be very useful. When you click ‘Create Repository’ you will see the following in the log as the master repository starts to be created; ====================================================== Repository Creation Started.... ====================================================== Master Repository Creation Started.... Then the completion message followed by the work repository creation and final completion message. Master Repository Creation Completed. Work Repository Creation Started. Work Repository Creation Completed. ====================================================== Repository Creation Completed Successfully ====================================================== Script exited. If any error is hit, the script just exits and prints any error to the log. For example if I enter no passwords, I will get this error; ====================================================== Repository Creation Started.... ====================================================== Master Repository Creation Started.... ====================================================== Repository Creation Complete in Error ====================================================== oracle.odi.setup.RepositorySetupException: oracle.odi.core.security.PasswordPolicyNotMatchedException: ODI-10189: Password policy MinPasswordLength is not matched. ====================================================== Script exited. This is another example of using the ODI 11g SDK showing how to automate the construction of your data integration environment. The main interfaces and classes used here are IMasterRepositorySetup / MasterRepositorySetupImpl and IWorkRepositorySetup / WorkRepositorySetupImpl.

Read the article

It's Official, I'm a Geek

- by andyleonard

I'm honored to join Glen Gordon ( Blog - @glengordon ) and G. Andrew Duthie ( Blog - @devhammer ) today at 3:00 PM EDT for an MSDN Webcast entitled GeekSpeak: Inside SQL Server Integration Services (SSIS). This is a LiveMeeting and you can join in the fun as an attendee here . It's a live show, so bring your questions! :{> Andy Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!...(read more)

Read the article

ODI 11g – How to override SQL at runtime?

- by David Allan

Following on from the posting some time back entitled ‘ODI 11g – Simple, Powerful, Flexible’ here we push the envelope even further. Rather than just having the SQL we override defined statically in the interface design we will have it configurable via a variable….at runtime. Imagine you have a well defined interface shape that you want to be fulfilled and that shape can be satisfied from a number of different sources that is what this allows - or the ability for one interface to consume data from many different places using variables. The cool thing about ODI’s reference API and this is that it can be fantastically flexible and useful. When I use the variable as the option value, and I execute the top level scenario that uses this temporary interface I get prompted (or can get prompted to be correct) for the value of the variable. Note I am using the <@=odiRef.getObjectName("L","EMP", "SCOTT","D")@> notation for the table reference, since this is done at runtime, then the context will resolve to the correct table name etc. Each time I execute, I could use a different source provider (obviously some dependencies on KMs/technologies here). For example, the following groovy snippet first executes and the query uses SCOTT model with EMP, the next time it is from BOB model and the datastore OTHERS. m=new Properties(); m.put("DEMO.SQLSTR", "select empno, deptno from <@=odiRef.getObjectName("L","EMP", "SCOTT","D")@>"); s=new StartupParams(m); runtimeAgent.startScenario("TOP", null, s, null, "GLOBAL", 5, null, true); m2=new Properties(); m2.put("DEMO.SQLSTR", "select empno, deptno from <@=odiRef.getObjectName("L","OTHERS", "BOB","D")@>"); s2=new StartupParams(m); runtimeAgent.startScenario("TOP", null, s2, null, "GLOBAL", 5, null, true); You’ll need a patch to 11.1.1.6 for this type of capability, thanks to my ole buddy Ron Gonzalez from the Enterprise Management group for help pushing the envelope!

Read the article

Why "Tailoring" Your Resume Is Bad

- by Mike C

I was just writing a response to a comment on my "Sell Yourself!" presentation ( http://sqlblog.com/blogs/michael_coles/archive/2010/12/05/sell-yourself-presentation.aspx#comments ), and it started getting a little lengthy so I decided to turn it into a blog post. The "Sell Yourself!" post got a couple of very good comments on the blog, and quite a few more comments offline. I think I'll start this one with a great exchange from the movie "The Princess Bride": Vizzini: HE DIDN'T FALL? INCONCEIVABLE....(read more)

Read the article

ODI 11g – Oracle Multi Table Insert

- by David Allan

With the IKM Oracle Multi Table Insert you can generate Oracle specific DML for inserting into multiple target tables from a single query result – without reprocessing the query or staging its result. When designing this to exploit the IKM you must split the problem into the reusable parts – the select part goes in one interface (I named SELECT_PART), then each target goes in a separate interface (INSERT_SPECIAL and INSERT_REGULAR). So for my statement below… /*INSERT_SPECIAL interface */ insert all when 1=1 And (INCOME_LEVEL > 250000) then into SCOTT.CUSTOMERS_NEW (ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED) values (ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED) /* INSERT_REGULAR interface */ when 1=1 then into SCOTT.CUSTOMERS_SPECIAL (ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED) values (ID, NAME, GENDER, BIRTH_DATE, MARITAL_STATUS, INCOME_LEVEL, CREDIT_LIMIT, EMAIL, USER_CREATED, DATE_CREATED, USER_MODIFIED, DATE_MODIFIED) /*SELECT*PART interface */ select CUSTOMERS.EMAIL EMAIL, CUSTOMERS.CREDIT_LIMIT CREDIT_LIMIT, UPPER(CUSTOMERS.NAME) NAME, CUSTOMERS.USER_MODIFIED USER_MODIFIED, CUSTOMERS.DATE_MODIFIED DATE_MODIFIED, CUSTOMERS.BIRTH_DATE BIRTH_DATE, CUSTOMERS.MARITAL_STATUS MARITAL_STATUS, CUSTOMERS.ID ID, CUSTOMERS.USER_CREATED USER_CREATED, CUSTOMERS.GENDER GENDER, CUSTOMERS.DATE_CREATED DATE_CREATED, CUSTOMERS.INCOME_LEVEL INCOME_LEVEL from SCOTT.CUSTOMERS CUSTOMERS where (1=1) Firstly I create a SELECT_PART temporary interface for the query to be reused and in the IKM assignment I state that it is defining the query, it is not a target and it should not be executed. Then in my INSERT_SPECIAL interface loading a target with a filter, I set define query to false, then set true for the target table and execute to false. This interface uses the SELECT_PART query definition interface as a source. Finally in my final interface loading another target I set define query to false again, set target table to true and execute to true – this is the go run it indicator! To coordinate the statement construction you will need to create a package with the select and insert statements. With 11g you can now execute the package in simulation mode and preview the generated code including the SQL statements. Hopefully this helps shed some light on how you can leverage the Oracle MTI statement. A similar IKM exists for Teradata. The ODI IKM Teradata Multi Statement supports this multi statement request in 11g, here is an extract from the paper at www.teradata.com/white-papers/born-to-be-parallel-eb3053/ Teradata Database offers an SQL extension called a Multi-Statement Request that allows several distinct SQL statements to be bundled together and sent to the optimizer as if they were one. Teradata Database will attempt to execute these SQL statements in parallel. When this feature is used, any sub-expressions that the different SQL statements have in common will be executed once, and the results shared among them. It works in the same way as the ODI MTI IKM, multiple interfaces orchestrated in a package, each interface contributes some SQL, the last interface in the chain executes the multi statement.

Read the article

ODI 11g – How to Load Using Partition Exchange

- by David Allan

Here we will look at how to load large volumes of data efficiently into the Oracle database using a mixture of CTAS and partition exchange loading. The example we will leverage was posted by Mark Rittman a couple of years back on Interval Partitioning, you can find that posting here. The best thing about ODI is that you can encapsulate all those ‘how to’ blog posts and scripts into templates that can be reused – the templates are of course Knowledge Modules. The interface design to mimic Mark's posting is shown below; The IKM I have constructed performs a simple series of steps to perform a CTAS to create the stage table to use in the exchange, then lock the partition (to ensure it exists, it will be created if it doesn’t) then exchange the partition in the target table. You can find the IKM Oracle PEL.xml file here. The IKM performs the follows steps and is meant to illustrate what can be done; So when you use the IKM in an interface you configure the options for hints (for parallelism levels etc), initial extent size, next extent size and the partition variable; The KM has an option where the name of the partition can be passed in, so if you know the name of the partition then set the variable to the name, if you have interval partitioning you probably don’t know the name, so you can use the FOR clause. In my example I set the variable to use the date value of the source data FOR (TO_DATE(''01-FEB-2010'',''dd-MON-yyyy'')) Using a variable lets me invoke the scenario many times loading different partitions of the same target table. Below you can see where this is defined within ODI, I had to double single-quote the strings since this is placed inside the execute immediate tasks in the KM; Note also this example interface uses the LKM Oracle to Oracle (datapump), so this illustration uses a lot of the high performing Oracle database capabilities – it uses Data Pump to unload, then a CreateTableAsSelect (CTAS) is executed on the external table based on top of the Data Pump export. This table is then exchanged in the target. The IKM and illustrations above are using ODI 11.1.1.6 which was needed to get around some bugs in earlier releases with how the variable is handled...as far as I remember.

Read the article

Oracle Magazine - OWB 11gR2 and Heterogeneous Databases

- by David Allan

There's a nice article titled 'Oracle Warehouse Builder 11g Release 2 and Heterogeneous Databases' from Oracle ACE director and cofounder of Rittman Mead Consulting, Mark Rittman in the May/June 2010 Oracle Magazine that covers the heterogeneous database support in OWB 11gR2: http://www.oracle.com/technology/oramag/oracle/10-may/o30bi.html Big thanks to Mark for this write up. There is an Oracle white paper on the support here and for examples of this extensibility you can go to the OWB blog archive where there are quite a few posts. I would recommend the following interesting posts out of the archive architecture overview, bulk file loading, MySQL open connectivity and MySQL bulk extract as interesting posts amongst others.

Read the article

Presenting at Roanoke Code Camp Saturday!

- by andyleonard

Introduction I am honored to once again be selected to present at Roanoke Code Camp ! An Introductory Topic One of my presentations is titled "I See a Control Flow Tab. Now What?" It's a Level 100 talk for those wishing to learn how to build their very first SSIS package. This highly-interactive, demo-intense presentation is for beginners and developers just getting started with SSIS. Attend and learn how to build SSIS packages from the ground up . Designing an SSIS Framework I'm also presenting...(read more)

Read the article

Interesting sessions/tips from RMOUG

- by jean-pierre.dijcks

One of the sessions I was at at last week's RMOUG was a session on Temp Tablespace Groups. I had a look because I had no experience with this and it seemed to help with parallel processing and the allocation/usage of temp. You can read the excellent write-up at Kellyn Pedersen's blog - who did the session and all the work - here. So for all of those who may be seeing lot's of waits like enq: TS - Contention when you are doing hash joins and sorts, do have a look at the above blog post. I also had the chance to listen in at Stewart Bryson's session on Restartability (he had 3 R-s) where he gave very useful tips about how to deal with your data warehouse loads. Questions like archive log mode - should I or should I not were well covered. Flashback archives, also nice to hear about. Very nice talk, very interesting. Unfortunately he hasn't blogged about it yes, so no pointers to that one. Got to see a couple of other interesting sessions, and as conferences go got to meet some interesting Oracle folks from the region. As usual RMOUG was useful and fun. Off to the drawing boards to design next year's session!

Read the article

OWB 11gR2 – JDBC Helper Utility

- by David Allan

One of the common queries when importing the tables via JDBC with 11gR2 is determining why the import wizard doesn’t display the tables that you think it should. I often just use the script below to dump out the schemas, tables and columns that the JDBC driver is returning. This is useful in a few areas; to figure out what the schema name is returned to double check with the schema name you have used in the location (this is used in the DatabaseMetaData.getTables API call within the basic JDBC metadata import. to figure out the data types returned from the JDBC driver when you see columns skipped because of no datatype supported messages. also…I can do it via scripting and don’t need to recompile classes and stuff :-) Edit the tcl script and set the JDBC driver, the connection URL and the username and password (they are at the bottom of the script), the script then calls a basic tcl procedure which writes to standard out the schemas, tables and columns with various properties. For example I executed it using the XML JDBC driver from ODI over a simple customers XML file and it writes the following metadata; You can add more details as you need and execute from the OMBPlus panel within OWB. Download the sample tcl jdbc script here There is a bunch of really useful stuff on OTN documenting this area (start with the white paper here) that is worth checking out all related to the OWB SDK covering everything from platform definitions, custom metadata importers, application adapters, code templates etc. You can find a bunch of goodies on the OWB SDK here.

Read the article

T-SQL Tuesday: Aggregations in SSIS

- by andyleonard

Introduction Jes Borland ( Blog | @grrl_geek ) is hosting this month's T-SQL Tuesday - started by SQLBlog's own Adam Machanic ( Blog | @AdamMachanic ) - and it is about aggregation. I thought I'd show a couple ways to do aggregation using SSIS. The Aggregate Transformation in SSIS The Aggregate transform in SSIS is fast . I built an SSIS package (AggregateScripts.dtsx) with two Data Flow Tasks (Using the Aggregate Transform and Using a Script Component). Using the Aggregate Transform looks like this:...(read more)

Read the article

External table and preprocessor for loading LOBs

- by David Allan

I was using the COLUMN TRANSFORMS syntax to load LOBs into Oracle using the Oracle external which is a handy way of doing several stuff - from loading LOBs from the filesystem to having constants as fields. In OWB you can use unbound external tables to define an external table using your own arbitrary access parameters - I blogged a while back on this for doing preprocessing before it was added into OWB 11gR2. For loading LOBs using the COLUMN TRANSFORMS syntax have a read through this post on loading CLOB, BLOB or any LOB, the files to load can be specified as a field that is a filename field, the content of this file will be the LOB data. So using the example from the linked post, you can define the columns; Then define the access parameters - if you go the unbound external table route you can can put whatever you want in here (your external table get out of jail free card); This will let you read the LOB files fromn the filesystem and use the external table in a mapping. Pushing the envelope a little further I then thought about marrying together the preprocessor with the COLUMN TRANSFORMS, this would have let me have a shell script for example as the preprocessor which listed the contents of a directory and let me read the files as LOBs via an external table. Unfortunately that doesn't quote work - there is now a bug/enhancement logged, so one day maybe. So I'm afraid my blog title was a little bit of a teaser....

Read the article

OWB 11gR2 – OMB and File Editing

- by David Allan

Here we will see how we can use the IDE for editing OMB scripts. The 11gR2 release is based on the common Oracle platform IDE used also by JDeveloper. It comes with a bunch of standard behavior for editing and rendering code. One of the lesser known things is that if you drop a text file into OWB you can edit it. So you can drop your tcl scripts right into OWB and edit in-place, and don’t need another IDE like Eclipse just for this task. Cool, so you have the file here. There may be no line numbers, you can toggle line numbers on by right clicking in the gutter. If we edit the file within the OWB IDE, the save is a little different from normal. OWB doesn’t normally manipulate files so things like ctrl-s to save, saves the OWB objects, but if you edit a file the closing of the file will ask if you want to save it – check it out. Now we enter the realm of ‘he who dares’…. Note the IDE doesn’t know about tcl files out of the box, so you see above there is no syntax highlighting. The code is identified by the extension… .java is java, .html is HTML etc. With OWB, the OMB scripts are tcl, we usually have .tcl extension on these files. One of the things we can do to trick up the syntax highlighting is to simply rename the file to have a .java suffix, then all of a sudden we get syntax highlighting, see the illustration here where side by side we see a the file with a .java extension and a .tcl extension. Not ideal pretending to be .java but gets us a way to having something more useful than notepad. We can then change the syntax highlighting such that we get Eclipse like highlighting within the IDE from the Tools Preferences option; You then get the Eclipse like rendering albeit using a little tweak on the file names… Might be useful if you are doing any kind of heavy duty OMB script development and just want a single IDE. The OMBPlus panel is then at hand for executing and testing it out.

Read the article

OWB - 11.2.0.4 Windows standalone client released

- by David Allan

The 11.2.0.4 release of OWB containing the 32 bit and 64 bit standalone Windows client is released today, I had previously blogged about the Linux standalone client here. Big thanks to Anil for spearheading that, another milestone on the Data Integration roadmap. Below are the patch numbers; 17743124 - OWB 11.2.0.4 STANDALONE CLIENT FOR Windows 64 BIT 17743119 - OWB 11.2.0.4 STANDALONE CLIENT FOR Windows 32 BIT This is the terminal release of OWB and customer bugs will be resolved on top of this release. We are excited to share information on the Oracle Data Integration 12c release in our upcoming launch video webcast on November 12th.

Read the article

Have You Downloaded SQL Server 2012 Evaluation Edition? Why Not?!

- by andyleonard

I am installing SQL Server 2012 Evaluation Edition on a virtual machine as I type. You can do this. Here’s one way: Grab some virtual machine software. I like Oracle VirtualBox . It’s cool. It’s free. Install VirtualBox. Download the 180-day free trial of Windows Server 2008 R2 . Also cool. Also free. Once Windows Server 2008 R2 is downloaded, build a VirtualBox VM. Download and install SQL Server 2012 Evaluation Edition ! That’s all there is to it. You can get started today, no need to wait until...(read more)

Read the article

Data Warehouse ETL slow - change primary key in dimension?

- by Jubbles

I have a working MySQL data warehouse that is organized as a star schema and I am using Talend Open Studio for Data Integration 5.1 to create the ETL process. I would like this process to run once per day. I have estimated that one of the dimension tables (dimUser) will have approximately 2 million records and 23 columns. I created a small test ETL process in Talend that worked, but given the amount of data that may need to be updated daily, the current performance will not cut it. It takes the ETL process four minutes to UPDATE or INSERT 100 records to dimUser. If I assumed a linear relationship between the count of records and the amount of time to UPDATE or INSERT, then there is no way the ETL can finish in 3-4 hours (my hope), let alone one day. Since I'm unfamiliar with Java, I wrote the ETL as a Python script and ran into the same problem. Although, I did discover that if I did only INSERT, the process went much faster. I am pretty sure that the bottleneck is caused by the UPDATE statements. The primary key in dimUser is an auto-increment integer. My friend suggested that I scrap this primary key and replace it with a multi-field primary key (in my case, 2-3 fields). Before I rip the test data out of my warehouse and change the schema, can anyone provide suggestions or guidelines related to the design of the data warehouse the ETL process how realistic it is to have an ETL process INSERT or UPDATE a few million records each day will my friend's suggestion significantly help If you need any further information, just let me know and I'll post it. UPDATE - additional information: mysql> describe dimUser; Field Type Null Key Default Extra user_key int(10) unsigned NO PRI NULL auto_increment id_A int(10) unsigned NO NULL id_B int(10) unsigned NO NULL field_4 tinyint(4) unsigned NO 0 field_5 varchar(50) YES NULL city varchar(50) YES NULL state varchar(2) YES NULL country varchar(50) YES NULL zip_code varchar(10) NO 99999 field_10 tinyint(1) NO 0 field_11 tinyint(1) NO 0 field_12 tinyint(1) NO 0 field_13 tinyint(1) NO 1 field_14 tinyint(1) NO 0 field_15 tinyint(1) NO 0 field_16 tinyint(1) NO 0 field_17 tinyint(1) NO 1 field_18 tinyint(1) NO 0 field_19 tinyint(1) NO 0 field_20 tinyint(1) NO 0 create_date datetime NO 2012-01-01 00:00:00 last_update datetime NO 2012-01-01 00:00:00 run_id int(10) unsigned NO 999 I used a surrogate key because I had read that it was good practice. Since, from a business perspective, I want to keep aware of potential fraudulent activity (say for 200 days a user is associated with state X and then the next day they are associated with state Y - they could have moved or their account could have been compromised), so that is why geographic data is kept. The field id_B may have a few distinct values of id_A associated with it, but I am interested in knowing distinct (id_A, id_B) tuples. In the context of this information, my friend suggested that something like (id_A, id_B, zip_code) be the primary key. For the large majority of daily ETL processes (80%), I only expect the following fields to be updated for existing records: field_10 - field_14, last_update, and run_id (this field is a foreign key to my etlLog table and is used for ETL auditing purposes).

Read the article

Data validate tools (ETL tools) for SQL server

- by Stan

I have some data in Excel and need to import into database. Is there any tool that can validate and maybe clean the data? Does Red Gate have such tool? The input will be Excel. Given table constraints, eg. CHECK, UNIQUE KEY, datetime format, NOT NULL. Desire output should be as least shows which lines are having problems, and then fix some trivial error automatically, like fill in default value for NULL columns, automatically correct datetime format. I know using Python can build such a script. But just wonder what's the popular way to do this. Thanks.

Read the article

How to figure out which record has been deleted in an effiecient way?

- by janetsmith

Hi, I am working on an in-house ETL solution, from db1 (Oracle) to db2 (Sybase). We needs to transfer data incrementally (Change Data Capture?) into db2. I have only read access to tables, so I can't create any table or trigger in Oracle db1. The challenge I am facing is, how to detect record deletion in Oracle? The solution which I can think of, is by using additional standalone/embedded db (e.g. derby, h2 etc). This db contains 2 tables, namely old_data, new_data. old_data contains primary key field from tahle of interest in Oracle. Every time ETL process runs, new_data table will be populated with primary key field from Oracle table. After that, I will run the following sql command to get the deleted rows: SELECT old_data.id FROM old_data WHERE old_data.id NOT IN (SELECT new_data.id FROM new_data) I think this will be a very expensive operation when the volume of data become very large. Do you have any better idea of doing this? Thanks.

Search Results

Search found 266 results on 11 pages for 'etl'.

Page 3/11 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 | Next Page >

- by Raj More

- by Kevin Yang

- by David Allan

- by andyleonard

- by David Allan

- by Anil Menon

- by David Allan

- by David Allan

- by andyleonard

- by David Allan

- by Mike C

- by David Allan

- by David Allan

- by David Allan

- by andyleonard

- by jean-pierre.dijcks

- by David Allan

- by andyleonard

- by David Allan

- by David Allan

- by David Allan

- by andyleonard

- by Jubbles

- by Stan

- by janetsmith

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 | Next Page >