Search Results

Search found 266 results on 11 pages for 'etl'.

Page 2/11 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11  | Next Page >

  • SQL SERVER – SSIS Parameters in Parent-Child ETL Architectures – Notes from the Field #040

    - by Pinal Dave
    [Notes from Pinal]: SSIS is very well explored subject, however, there are so many interesting elements when we read, we learn something new. A similar concept has been Parent-Child ETL architecture’s relationship in SSIS. Linchpin People are database coaches and wellness experts for a data driven world. In this 40th episode of the Notes from the Fields series database expert Tim Mitchell (partner at Linchpin People) shares very interesting conversation related to how to understand SSIS Parameters in Parent-Child ETL Architectures. In this brief Notes from the Field post, I will review the use of SSIS parameters in parent-child ETL architectures. A very common design pattern used in SQL Server Integration Services is one I call the parent-child pattern.  Simply put, this is a pattern in which packages are executed by other packages.  An ETL infrastructure built using small, single-purpose packages is very often easier to develop, debug, and troubleshoot than large, monolithic packages.  For a more in-depth look at parent-child architectures, check out my earlier blog post on this topic. When using the parent-child design pattern, you will frequently need to pass values from the calling (parent) package to the called (child) package.  In older versions of SSIS, this process was possible but not necessarily simple.  When using SSIS 2005 or 2008, or even when using SSIS 2012 or 2014 in package deployment mode, you would have to create package configurations to pass values from parent to child packages.  Package configurations, while effective, were not the easiest tool to work with.  Fortunately, starting with SSIS in SQL Server 2012, you can now use package parameters for this purpose. In the example I will use for this demonstration, I’ll create two packages: one intended for use as a child package, and the other configured to execute said child package.  In the parent package I’m going to build a for each loop container in SSIS, and use package parameters to pass in a value – specifically, a ClientID – for each iteration of the loop.  The child package will be executed from within the for each loop, and will create one output file for each client, with the source query and filename dependent on the ClientID received from the parent package. Configuring the Child and Parent Packages When you create a new package, you’ll see the Parameters tab at the package level.  Clicking over to that tab allows you to add, edit, or delete package parameters. As shown above, the sample package has two parameters.  Note that I’ve set the name, data type, and default value for each of these.  Also note the column entitled Required: this allows me to specify whether the parameter value is optional (the default behavior) or required for package execution.  In this example, I have one parameter that is required, and the other is not. Let’s shift over to the parent package briefly, and demonstrate how to supply values to these parameters in the child package.  Using the execute package task, you can easily map variable values in the parent package to parameters in the child package. The execute package task in the parent package, shown above, has the variable vThisClient from the parent package mapped to the pClientID parameter shown earlier in the child package.  Note that there is no value mapped to the child package parameter named pOutputFolder.  Since this parameter has the Required property set to False, we don’t have to specify a value for it, which will cause that parameter to use the default value we supplied when designing the child pacakge. The last step in the parent package is to create the for each loop container I mentioned earlier, and place the execute package task inside it.  I’m using an object variable to store the distinct client ID values, and I use that as the iterator for the loop (I describe how to do this more in depth here).  For each iteration of the loop, a different client ID value will be passed into the child package parameter. The final step is to configure the child package to actually do something meaningful with the parameter values passed into it.  In this case, I’ve modified the OleDB source query to use the pClientID value in the WHERE clause of the query to restrict results for each iteration to a single client’s data.  Additionally, I’ll use both the pClientID and pOutputFolder parameters to dynamically build the output filename. As shown, the pClientID is used in the WHERE clause, so we only get the current client’s invoices for each iteration of the loop. For the flat file connection, I’m setting the Connection String property using an expression that engages both of the parameters for this package, as shown above. Parting Thoughts There are many uses for package parameters beyond a simple parent-child design pattern.  For example, you can create standalone packages (those not intended to be used as a child package) and still use parameters.  Parameter values may be supplied to a package directly at runtime by a SQL Server Agent job, through the command line (via dtexec.exe), or through T-SQL. Also, you can also have project parameters as well as package parameters.  Project parameters work in much the same way as package parameters, but the parameters apply to all packages in a project, not just a single package. Conclusion Of the numerous advantages of using catalog deployment model in SSIS 2012 and beyond, package parameters are near the top of the list.  Parameters allow you to easily share values from parent to child packages, enabling more dynamic behavior and better code encapsulation. If you want me to take a look at your server and its settings, or if your server is facing any issue we can Fix Your SQL Server. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Notes from the Field, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Heterogén adatelérés OWB-vel: ODI EE Enterprise ETL

    - by Fekete Zoltán
    Az elozo ketto blogbejegyzéshez kapcsolódva felmerül a kérdés: Hogyan lehet az Oracle Warehouse Builderrel heterogén adatforrásokat elérni? Ajánlott olvasmány: Oracle Warehouse Builder 11gR2: OWB ETL Using ODI Knowledge Modules Természetesen az OWB az Oracle Database Heterogeneous Services-zel ODBC-vel illetve Oracle Gateway-k alkalmazásával eddig is lehetett mindenféle ODBC kompatibilis továbbá mainframe-es adatbázisokat elérni. Oracle Database Gateways: MS SQL Server, Sybase, Teradata, Informix, ODBC, DRDA, APPC, WebSphere MQ, DB2, DB2/400. A megfelelo Application Adapters megvásárlásával lehet csatlakozni az OWB-vel például a következo forrásokhoz: SAP, Oracle E-Business Suite, Peoplesoft, Siebel, Oracle Customer Data Hub (CDH), Universal Customer Master (UCM), Product Information Management (PIM). Az OWB 11gR2-tol kezdve az OWB tudja használni az Oracle Data Integrator Knowledge moduljait a heterogén adatelérésre, ez JDBC-vel illetve más heterogén elérési módokkal. Ajánlott olvasmány: Oracle Warehouse Builder 11gR2: OWB ETL Using ODI Knowledge Modules Letöltés: Oracle Warehouse Builder. BTW az OWB Java-s kliens szoftver Linux-on és Windows-on is használható. A szerver oldal pedig természetesen az Oracle adatbázisban fut: Solaris, Linux, HP-UX, AIX, Windows operációs rendszereken.

    Read the article

  • Do reactive extensions and ETL go together?

    - by Aaron Anodide
    I don't fully understand reactive extensions, but my inital reading caused me think about the ETL code I have. Right now its basically a workflow to to perform various operations in a certain sequence based on conditions it find as it progresses. I can also imagine an event driven way such that only a small amount of imperative logic causes a chain reaction to occur. Of course I don't need a new type of programming model to make an event driven collaboration like that. Just the same I am wondering if ETL is a good fit for potentially exploring Rx further. Is my connection in a valid direction even? If not, could you briefly correct the error in my logic?

    Read the article

  • Extracting, Transforming, and Loading (ETL) Process

    The process of Extracting, Transforming, and Loading data in to a data warehouse is called Extract Transform Load (ETL) process.  This process can be used to obtain, analyze, and clean data from various data sources so that it can be stored in a uniform manner within a data warehouse. This data can then be used by various business intelligence processes to provide an organization with more of an in depth analysis of the current state of the company and where it is heading. A standard ETL process that might be used by a health care system may include importing all of their patients names, diagnoses and prescriptions in to a unified data warehouse so that trends can be spotted in regards to outbreaks like the flu and also predict potential illness that a patient might be affected by based on other patients with similar symptoms.

    Read the article

  • Service Broker, not ETL

    - by jamiet
    I have been very quiet on this blog of late and one reason for that is I have been very busy on a client project that I would like to talk about a little here. The client that I have been working for has a website that runs on a distributed architecture utilising a messaging infrastructure for communication between different endpoints. My brief was to build a system that could consume these messages and produce analytical information in near-real-time. More specifically I basically had to deliver a data warehouse however it was the real-time aspect of the project that really intrigued me. This real-time requirement meant that using an Extract transformation, Load (ETL) tool was out of the question and so I had no choice but to write T-SQL code (i.e. stored-procedures) to process the incoming messages and load the data into the data warehouse. This concerned me though – I had no way to control the rate at which data would arrive into the system yet we were going to have end-users querying the system at the same time that those messages were arriving; the potential for contention in such a scenario was pretty high and and was something I wanted to minimise as much as possible. Moreover I did not want the processing of data inside the data warehouse to have any impact on the customer-facing website. As you have probably guessed from the title of this blog post this is where Service Broker stepped in! For those that have not heard of it Service Broker is a queuing technology that has been built into SQL Server since SQL Server 2005. It provides a number of features however the one that was of interest to me was the fact that it facilitates asynchronous data processing which, in layman’s terms, means the ability to process some data without requiring the system that supplied the data having to wait for the response. That was a crucial feature because on this project the customer-facing website (in effect an OLTP system) would be calling one of our stored procedures with each message – we did not want to cause the OLTP system to wait on us every time we processed one of those messages. This asynchronous nature also helps to alleviate the contention problem because the asynchronous processing activity is handled just like any other task in the database engine and hence can wait on another task (such as an end-user query). Service Broker it was then! The stored procedure called by the OLTP system would simply put the message onto a queue and we would use a feature called activation to pick each message off the queue in turn and process it into the warehouse. At the time of writing the system is not yet up to full capacity but so far everything seems to be working OK (touch wood) and crucially our users are seeing data in near-real-time. By near-real-time I am talking about latencies of a few minutes at most and to someone like me who is used to building systems that have overnight latencies that is a huge step forward! So then, am I advocating that you all go out and dump your ETL tools? Of course not, no! What this project has taught me though is that in certain scenarios there may be better ways to implement a data warehouse system then the traditional “load data in overnight” approach that we are all used to. Moreover I have really enjoyed getting to grips with a new technology and even if you don’t want to use Service Broker you might want to consider asynchronous messaging architectures for your BI/data warehousing solutions in the future. This has been a very high level overview of my use of Service Broker and I have deliberately left out much of the minutiae of what has been a very challenging implementation. Nonetheless I hope I have caused you to reflect upon your own approaches to BI and question whether other approaches may be more tenable. All comments and questions gratefully received! Lastly, if you have never used Service Broker before and want to kick the tyres I have provided below a very simple “Service Broker Hello World” script that will create all of the objects required to facilitate Service Broker communications and then send the message “Hello World” from one place to anther! This doesn’t represent a “proper” implementation per se because it doesn’t close down down conversation objects (which you should always do in a real-world scenario) but its enough to demonstrate the capabilities! @Jamiet ----------------------------------------------------------------------------------------------- /*This is a basic Service Broker Hello World app. Have fun! -Jamie */ USE MASTER GO CREATE DATABASE SBTest GO --Turn Service Broker on! ALTER DATABASE SBTest SET ENABLE_BROKER GO USE SBTest GO -- 1) we need to create a message type. Note that our message type is -- very simple and allowed any type of content CREATE MESSAGE TYPE HelloMessage VALIDATION = NONE GO -- 2) Once the message type has been created, we need to create a contract -- that specifies who can send what types of messages CREATE CONTRACT HelloContract (HelloMessage SENT BY INITIATOR) GO --We can query the metadata of the objects we just created SELECT * FROM   sys.service_message_types WHERE name = 'HelloMessage'; SELECT * FROM   sys.service_contracts WHERE name = 'HelloContract'; SELECT * FROM   sys.service_contract_message_usages WHERE  service_contract_id IN (SELECT service_contract_id FROM sys.service_contracts WHERE name = 'HelloContract') AND        message_type_id IN (SELECT message_type_id FROM sys.service_message_types WHERE name = 'HelloMessage'); -- 3) The communication is between two endpoints. Thus, we need two queues to -- hold messages CREATE QUEUE SenderQueue CREATE QUEUE ReceiverQueue GO --more querying metatda SELECT * FROM sys.service_queues WHERE name IN ('SenderQueue','ReceiverQueue'); --we can also select from the queues as if they were tables SELECT * FROM SenderQueue   SELECT * FROM ReceiverQueue   -- 4) Create the required services and bind them to be above created queues CREATE SERVICE Sender   ON QUEUE SenderQueue CREATE SERVICE Receiver   ON QUEUE ReceiverQueue (HelloContract) GO --more querying metadata SELECT * FROM sys.services WHERE name IN ('Receiver','Sender'); -- 5) At this point, we can begin the conversation between the two services by -- sending messages DECLARE @conversationHandle UNIQUEIDENTIFIER DECLARE @message NVARCHAR(100) BEGIN   BEGIN TRANSACTION;   BEGIN DIALOG @conversationHandle         FROM SERVICE Sender         TO SERVICE 'Receiver'         ON CONTRACT HelloContract WITH ENCRYPTION=OFF   -- Send a message on the conversation   SET @message = N'Hello, World';   SEND  ON CONVERSATION @conversationHandle         MESSAGE TYPE HelloMessage (@message)   COMMIT TRANSACTION END GO --check contents of queues SELECT * FROM SenderQueue   SELECT * FROM ReceiverQueue   GO -- Receive a message from the queue RECEIVE CONVERT(NVARCHAR(MAX), message_body) AS MESSAGE FROM ReceiverQueue GO --If no messages were received and/or you can't see anything on the queues you may wish to check the following for clues: SELECT * FROM sys.transmission_queue -- Cleanup DROP SERVICE Sender DROP SERVICE Receiver DROP QUEUE SenderQueue DROP QUEUE ReceiverQueue DROP CONTRACT HelloContract DROP MESSAGE TYPE HelloMessage GO USE MASTER GO DROP DATABASE SBTest GO

    Read the article

  • SQL SERVER – Integrate Your Data with Skyvia – Cloud ETL Solution

    - by Pinal Dave
    In our days data integration often becomes a key aspect of business success. For business analysts it’s very important to get integrated data from various sources, such as relational databases, cloud CRMs, etc. to make correct and successful decisions. There are various data integration solutions on market, and today I will tell about one of them – Skyvia. Skyvia is a cloud data integration service, which allows integrating data in cloud CRMs and different relational databases. It is a completely online solution and does not require anything except for a browser. Skyvia provides powerful etl tools for data import, export, replication, and synchronization for SQL Server and other databases and cloud CRMs. You can use Skyvia data import tools to load data from various sources to SQL Server (and SQL Azure). Skyvia supports such cloud CRMs as Salesforce and Microsoft Dynamics CRM and such databases as MySQL and PostgreSQL. You even can migrate data from SQL Server to SQL Server, or from SQL Server to other databases and cloud CRMs. Additionally Skyvia supports import of CSV files, either uploaded manually or stored on cloud file storage services, such as Dropbox, Box, Google Drive, or FTP servers. When data import is not enough, Skyvia offers bidirectional data synchronization. With this tool, you can synchronize SQL Server data with other databases and cloud CRMs. After performing the first synchronization, Skyvia tracks data changes in the synchronized data storages. In SQL Server databases (and other relational databases) it creates additional tracking tables and triggers. This allows synchronizing only the changed data. Skyvia also maps records by their primary key values to each other, so it does not require different sources to have the same primary key structure. It still can match the corresponding records without having to add any additional columns or changing data structure. The only requirement for synchronization is that primary keys must be autogenerated. With Skyvia it’s not necessary for data to have the same structure in integrated data storages. Skyvia supports powerful mapping mechanisms that allow synchronizing data with completely different structure. It provides support for complex mathematical and string expressions when mapping data, using lookups, etc. You may use data splitting – loading data from a single CSV file or source table to multiple related target tables. Or you may load data from several source CSV files or tables to several related target tables. In each case Skyvia preserves data relations. It builds corresponding relations between the target data automatically. When you often work with cloud CRM data, native CRM data reporting and analysis tools may be not enough for you. And there is a vast set of professional data analysis and reporting tools available for SQL Server. With Skyvia you can quickly copy your cloud CRM data to an SQL Server database and apply corresponding SQL Server tools to the data. In such case you can use Skyvia data replication tools. It allows you to quickly copy cloud CRM data to SQL Server or other databases without customizing any mapping. You need just to specify columns to copy data from. Target database tables will be created automatically. Skyvia offers powerful filtering settings to replicate only the records you need. Skyvia also provides capability to export data from SQL Server (including SQL Azure) and other databases and cloud CRMs to CSV files. These files can be either downloadable manually or loaded to cloud file storages or FTP server. You can use export, for example, to backup SQL Azure data to Dropbox. Any data integration operation can be scheduled for automatic execution. Thus, you can automate your SQL Azure data backup or data synchronization – just configure it once, then schedule it, and benefit from automatic data integration with Skyvia. Currently registration and using Skyvia is completely free, so you can try it yourself and find out whether its data migration and integration tools suits for you. Visit this link to register on Skyvia: https://app.skyvia.com/register Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Cloud Computing

    Read the article

  • OBIA on Teradata - Part 2 Teradata DB Utilization for ETL

    - by Mohan Ramanuja
    Techniques to Monitor Queries and ETL Load CPU and Disk I/OSelect username, processor, sum(cputime), sum(diskio) from dbc.ampusage where processor ='1-0' order by 2,3 descgroup by 1,2;UserName    Vproc    Sum(CpuTime)    Sum(DiskIO)AC00916        10    6.71            24975 List Hardware ErrorsThere is a possibility that the system might have adequate disk space but out of free cylinders. In order to monitor hardware errors, the following query was used:Select * from dbc.Software_Event_Log where Text like '%restart%' order by thedate, thetime;For active users, usage of CPU and analysis of bad CPU to I/O ratiosSelect * from DBC.AMPUSAGE where username='CRMSTGC_DEV_ID';  AND SUBSTR(ACCOUNTNAME,6,3)='006'; Usage By I/OSelect AccountName, UserName, sum(CpuTime), sum(DiskIO)  from DBC.AMPUSAGE group by AccountName, UserName Order by Sum(DiskIO) desc; AccountName                       UserName                          Sum(CpuTime)  Sum(DiskIO)$M1$10062209                      AB89487                           374628.612    7821847$M1$10062210                      AB89487                           186692.244    2799412$M1$10062213                      COC_ETL_ID                        119531.068    331100426$M1$10062200                      AB63472                           118973.316    109881984$M1$10062204                      AB63472                           110825.356    94666986$M1$10062201                      AB63472                           110797.976    75016994$M1$10062202                      AC06936                           100924.448    407839702$M1$10062204                      AB67963                           0         4$M1$10062207                      AB91990                           0         2$M1$10062208                      AB63461                           0         24$M1$10062211                      AB84332                           0         6$M1$10062214                      AB65484                           0         8$M1$10062205                      AB77529                           0         58$M1$10062210                      AC04768                           0         36$M1$10062206                      AB54940                           0         22 Usage By CPUSelect AccountName, UserName, sum(CpuTime), sum(DiskIO)  from DBC.AMPUSAGE group by AccountName, UserName Order by Sum(CpuTime) desc;AccountName                       UserName                          Sum(CpuTime)  Sum(DiskIO)$M1$10062209                      AB89487                           374628.612    7821847$M1$10062210                      AB89487                           186692.244    2799412$M1$10062213                      COC_ETL_ID                        119531.068    331100426$M1$10062200                      AB63472                           118973.316    109881984$M1$10062204                      AB63472                           110825.356    94666986$M1$10062201                      AB63472                           110797.976    75016994$M2$100622105813004760047LOAD     T23_ETLPROC_ENT                   0 6$M1$10062215                      AA37720                           0     180$M1$10062209                      AB81670                           0     6Select count(distinct vproc) from dbc.ampusage;432select * from dbc.dbcinfo;AccountName     UserName     CpuTime DiskIO  CpuTimeNorm         Vproc VprocType    Model$M1$10062205                      CRM_STGC_DEV_ID                   0.32    1764    12.7423999023438    0     AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.28    1730    11.1495999145508    3     AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.304    1736    12.1052799072266    4    AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.248    1731    9.87535992431641    7    AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.332    1731    13.2202398986816    8    AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.284    1712    11.3088799133301    11   AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.24    1757    9.55679992675781    12    AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.292    1737    11.6274399108887    15   AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.268    1753    10.6717599182129    16   AMP      2580$M1$10062205                      CRM_STGC_DEV_ID                   0.276    1732    10.9903199157715    19   AMP      2580select * from dbc.dbcinfo;InfoKey    InfoDataLANGUAGE   SUPPORT           MODE    StandardRELEASE    12.00.03.03VERSION    12.00.03.01a

    Read the article

  • Straight Java/Groovy versus ETL tool (Talend/etc) - what libraries would you use?

    - by Alex R
    Assume you have a small project which on the surface looks like a good match for an ETL tool like Talend. But assume further, that you have never used Talend and furthermore, you do not trust "visual programming" tools in general and would rather code everything the old fashioned way (text on a nice IDE!) with the help of an appropriate language & support libraries. What are some language patterns & support libraries that could help you stay away from the ETL tool temptation/trap?

    Read the article

  • Speaking - Automate Your ETL Infrastructure with SSIS and PowerShell

    - by AllenMWhite
    Today at 4:45PM EDT I'm presenting a new session using PowerShell to auto-generate SSIS packages via the BIML language. The really cool thing is that this session will be live broadcast on PASS TV! You can view the session by clicking on this link . If you have questions for me during the session, you can send them to me via Twitter using this hashtag: #posh2biml Brian Davis, my good friend from the Ohio North SQL Server Users Group, will be monitoring that hashtag and feeding me the questions that...(read more)

    Read the article

  • The ETL from Hell - Diagnosing Batch System Performance Issues

    Too often, the batch systems that underlie a lot of database processing just grow without conscious design. When runs start to extend beyond their allotted time, and tuning no longer solves the problem, it is often discovered that batches are run in series, with draconian error handling. It is time to impose some rational design, and Nigel is a seasoned healer of batch processes. The seven tools in the SQL DBA Bundle support your core SQL Server database administration tasks.Make backups a breeze! Enjoy trouble-free troubleshooting! Make the most of monitoring! Download a free trial now.

    Read the article

  • Can you safely rely upon Yahoo Pipes to offload ETL for your application?

    - by Daniel DiPaolo
    Yahoo Pipes are a very intriguing choice for sort of a poor-man's server-free ETL solution, but would it be a good idea to build an application around one or many Pipes? I've really only used them for toy things here and there, with the only thing I've used longer than a week or two being one amalgamated and filtered RSS feed that I've plugged into Google Reader (which has worked great, but if it goes out for a while I wouldn't notice). So, my question is, would building an application around Yahoo Pipes be reliable (available most of the time)? Ideally it'd be something I could rely on being up 99+% of the time. It looks like the Pipes Terms of Use permit building apps around it, but I am unfamiliar with anyone building anything significant using them.

    Read the article

  • ETL : Tracking changes to data using Materialized View log

    - by avinash
    I am into designing ETL with source and target database as oracle Standard Edition. For ETL purpose I need to get the changed data everytime.Client does not want any changes to be made in source objects. Is it feasible to create Materialized view log on source database using dblink to track Inser/Update/Delete on the identified tables. Thanks and Regards

    Read the article

  • OWB 11gR2 &ndash; OLAP and Simba

    - by David Allan
    Oracle Warehouse Builder was the first ETL product to provide a single integrated and complete environment for managing enterprise data warehouse solutions that also incorporate multi-dimensional schemas. The OWB 11gR2 release provides Oracle OLAP 11g deployment for multi-dimensional models (in addition to support for prior releases of OLAP). This means users can easily utilize Simba's MDX Provider for Oracle OLAP (see here for details and cost) which allows you to use the powerful and popular ad hoc query and analysis capabilities of Microsoft Excel PivotTables® and PivotCharts® with your Oracle OLAP business intelligence data. The extensions to the dimensional modeling capabilities have been built on established relational concepts, with the option to seamlessly move from a relational deployment model to a multi-dimensional model at the click of a button. This now means that ETL designers can logically model a complete data warehouse solution using one single tool and control the physical implementation of a logical model at deployment time. As a result data warehouse projects that need to provide a multi-dimensional model as part of the overall solution can be designed and implemented faster and more efficiently. Wizards for dimensions and cubes let you quickly build dimensional models and realize either relationally or as an Oracle database OLAP implementation, both 10g and 11g formats are supported based on a configuration option. The wizard provides a good first cut definition and the objects can be further refined in the editor. Both wizards let you choose the implementation, to deploy to OLAP in the database select MOLAP: multidimensional storage. You will then be asked what levels and attributes are to be defined, by default the wizard creates a level bases hierarchy, parent child hierarchies can be defined in the editor. Once the dimension or cube has been designed there are special mapping operators that make it easy to load data into the objects, below we load a constant value for the total level and the other levels from a source table.   Again when the cube is defined using the wizard we can edit the cube and define a number of analytic calculations by using the 'generate calculated measures' option on the measures panel. This lets you very easily add a lot of rich analytic measures to your cube. For example one of the measures is the percentage difference from a year ago which we can see in detail below. You can also add your own custom calculations to leverage the capabilities of the Oracle OLAP option, either by selecting existing template types such as moving averages to defining true custom expressions. The 11g OLAP option now supports percentage based summarization (the amount of data to precompute and store), this is available from the option 'cost based aggregation' in the cube's configuration. Ensure all measure-dimensions level based aggregation is switched off (on the cube-dimension panel) - previously level based aggregation was the only option. The 11g generated code now uses the new unified API as you see below, to generate the code, OWB needs a valid connection to a real schema, this was not needed before 11gR2 and is a new requirement since the OLAP API which OWB uses is not an offline one. Once all of the objects are deployed and the maps executed then we get to the fun stuff! How can we analyze the data? One option which is powerful and at many users' fingertips is using Microsoft Excel PivotTables® and PivotCharts®, which can be used with your Oracle OLAP business intelligence data by utilizing Simba's MDX Provider for Oracle OLAP (see Simba site for details of cost). I'll leave the exotic reporting illustrations to the experts (see Bud's demonstration here), but with Simba's MDX Provider for Oracle OLAP its very simple to easily access the analytics stored in the database (all built and loaded via the OWB 11gR2 release) and get the regular features of Excel at your fingertips such as using the conditional formatting features for example. That's a very quick run through of the OWB 11gR2 with respect to Oracle 11g OLAP integration and the reporting using Simba's MDX Provider for Oracle OLAP. Not a deep-dive in any way but a quick overview to illustrate the design capabilities and integrations possible.

    Read the article

  • Oracle Warehouse Builder 11gR2 Windows-ra is

    - by Fekete Zoltán
    A héten megjelent az Oracle Database 11g Release 2 Windows platformra is, így lett teljes a kép a legfontosabb szerver operációs rendszerek körében, ezáltal az OWB kliens is hozzáférheto lett Windows-on. Az OWB az Oracle piacvezeto ETL eszköze, extraction, transformation, load - adatkinyerés, betöltés és átalakítás. Az Oracle Warehouse Builder Java-s kliens programja eddig is elérheto volt Linuxon, most már supportáltan megvan Windows-ra is (kis hegesztéssel eddig is lehetett a Linux-os Java-s változatot használni Windows-on). Az OWB vindózos kliens kétféle módon érheto el: - a Database 11gR2 Windows install készlet telepítésével automatikusan felkerül, letöltés - önállóan is felrakható más gépre (standalon), letöltés, itt a Linux kliens is megtalálható. Ez a standalone verzió most jelent meg az OTN-en 2-3 órája. :)

    Read the article

  • OWB – OWBLand on SourceForge

    - by David Allan
    There are a bunch of interesting utilities that are either experts or OMB scripts that are hosted on SourceForge by some keen OWB users (see the home here). One of the main initiatives has been an Excel to OWB ‘one click ETL’ utility, which looks to have had a fair amount of code added, there is an example but its kinda light on documentation, but does look like it covers quite a lot. One of the nice things about SourceForge is that you can peek into the statistics and see what kind of activity has gone on, from last August there have been a bunch of downloads with a big peak last November… Another utility that is there is one to generate OMB from a mapping definition, a bunch of useful stuff there - http://sourceforge.net/projects/owbland/files/

    Read the article

  • OWB és heterogén adatforrások, Oracle Magazine, 2010. május-június

    - by Fekete Zoltán
    Megjelent az Oracle Magazine aktuális száma (naná, az aktuális számnak ez a dolga. Oracle Magazine, 2010. május-június. Ebben a számban sok érdekes cikk közül válogathatunk: cloud computing, Java, .Net, új generációs backup, párhuzamosság és PL/SQL, OWB,... Ajánlom a Business Intelligence - Oracle Warehouse Builder 11g Release 2 and Heterogeneous Databases cikket, melyben megtudhatjuk, hogyan használhatunk heterogén adatforrásokat az Oracle Warehouse Builder ETL-ELT eszközzel, hogyan tudunk például SQL Serverhez csatlakozni, és nagy teljesítménnyel adatokat kinyerni. Az Oracle adatintegrációs weblapja. Ez a gazdag heterogenitás az OWB az Oracle Data Integrator testvér termékbol jön. Az adatintegrációs SOD azt mondja, hogy ez a két Java alapú termék, az OWB és az ODI egy termékben fognak egyesülni.

    Read the article

  • SSIS is Case-Sensitive

    - by andyleonard
    Introduction SSIS is case-sensitive even if the database is case-insensitive. Imagine... ... you work in an ETL shop where someone who believes in natural keys won the Battle of the Joins. Imagine one of your natural keys is a string. (I know it's a stretch... play along!). Let's build some tables to sketch it out. If you do not have a TestDB database, why not? Build one! You'll use it often. Use TestDB go Create Table SSIS1 ( StrID char ( 5 ) , Name varchar ( 15 ) , Value int ) Insert Into SSIS1...(read more)

    Read the article

  • OWB 11gR2 &ndash; Degenerate Dimensions

    - by David Allan
    Ever wondered how to build degenerate dimensions in OWB and get the benefits of slowly changing dimensions and cube loading? Now its possible through some changes in 11gR2 to make the dimension and cube loading much more flexible. This will let you get the benefits of OWB's surrogate key handling and slowly changing dimension reference when loading the fact table and need degenerate dimensions (see Ralph Kimball's degenerate dimensions design tip). Here we will see how to use the cube operator to load slowly changing, regular and degenerate dimensions. The cube and cube operator can now work with dimensions which have no surrogate key as well as dimensions with surrogates, so you can get the benefit of the cube loading and incorporate the degenerate dimension loading. What you need to do is create a dimension in OWB that is purely used for ETL metadata; the dimension itself is never deployed (its table is, but has not data) it has no surrogate keys has a single level with a business attribute the degenerate dimension data and a dummy attribute, say description just to pass the OWB validation. When this degenerate dimension is added into a cube, you will need to configure the fact table created and set the 'Deployable' flag to FALSE for the foreign key generated to the degenerate dimension table. The degenerate dimension reference will then be in the cube operator and used when matching. Create the degenerate dimension using the regular wizard. Delete the Surrogate ID attribute, this is not needed. Define a level name for the dimension member (any name). After the wizard has completed, in the editor delete the hierarchy STANDARD that was automatically generated, there is only a single level, no need for a hierarchy and this shouldn't really be created. Deploy the implementing table DD_ORDERNUMBER_TAB, this needs to be deployed but with no data (the mapping here will do a left outer join of the source data with the empty degenerate dimension table). Now, go ahead and build your cube, use the regular TIMES dimension for example and your degenerate dimension DD_ORDERNUMBER, can add in SCD dimensions etc. Configure the fact table created and set Deployable to false, so the foreign key does not get generated. Can now use the cube in a mapping and load data into the fact table via the cube operator, this will look after surrogate lookups and slowly changing dimension references.   If you generate the SQL you will see the ON clause for matching includes the columns representing the degenerate dimension columns. Here we have seen how this use case for loading fact tables using degenerate dimensions becomes a whole lot simpler using OWB 11gR2. I'm sure there are other use cases where using this mix of dimensions with surrogate and regular identifiers is useful, Fact tables partitioned by date columns is another classic example that this will greatly help and make the cube operator much more useful. Good to hear any comments.

    Read the article

  • What is the right way to process inconsistent data files?

    - by Tahabi
    I'm working at a company that uses Excel files to store product data, specifically, test results from products before they are shipped out. There are a few thousand spreadsheets with anywhere from 50-100 relevant data points per file. Over the years, the schema for the spreadsheets has changed significantly, but not unidirectionally - in the sense that, changes often get reverted and then re-added in the space of a few dozen to few hundred files. My project is to convert about 8000 of these spreadsheets into a database that can be queried. I'm using MongoDB to deal with the inconsistency in the data, and Python. My question is, what is the "right" or canonical way to deal with the huge variance in my source files? I've written a data structure which stores the data I want for the latest template, which will be the final template used going forward, but that only helps for a few hundred files historically. Brute-forcing a solution would mean writing similar data structures for each version/template - which means potentially writing hundreds of schemas with dozens of fields each. This seems very inefficient, especially when sometimes a change in the template is as little as moving a single line of data one row down or splitting what used to be one data field into two data fields. A slightly more elegant solution I have in mind would be writing schemas for all the variants I can find for pre-defined groups in the source files, and then writing a function to match a particular series of files with a series of variants that matches that set of files. This is because, more often that not, most of the file will remain consistent over a long period, only marred by one or two errant sections, but inside the period, which section is inconsistent, is inconsistent. For example, say a file has four sections with three data fields, which is represented by four Python dictionaries with three keys each. For files 7000-7250, sections 1-3 will be consistent, but section 4 will be shifted one row down. For files 7251-7500, 1-3 are consistent, section 4 is one row down, but a section five appears. For files 7501-7635, sections 1 and 3 will be consistent, but section 2 will have five data fields instead of three, section five disappears, and section 4 is still shifted down one row. For files 7636-7800, section 1 is consistent, section 4 gets shifted back up, section 2 returns to three cells, but section 3 is removed entirely. Files 7800-8000 have everything in order. The proposed function would take the file number and match it to a dictionary representing the data mappings for different variants of each section. For example, a section_four_variants dictionary might have two members, one for the shifted-down version, and one for the normal version, a section_two_variants might have three and five field members, etc. The script would then read the matchings, load the correct mapping, extract the data, and insert it into the database. Is this an accepted/right way to go about solving this problem? Should I structure things differently? I don't know what to search Google for either to see what other solutions might be, though I believe the problem lies in the domain of ETL processing. I also have no formal CS training aside from what I've taught myself over the years. If this is not the right forum for this question, please tell me where to move it, if at all. Any help is most appreciated. Thank you.

    Read the article

  • File Sync Solution for Batch Processing (ETL)

    - by KenFar
    I'm looking for a slightly different kind of sync utility - not one designed to keep two directories identical, but rather one intended to keep files flowing from one host to another. The context is a data warehouse that currently has a custom-developed solution that moves 10,000 files a day, some of which are 1+ gbytes gzipped files, between linux servers via ssh. Files are produced by the extract process, then moved to the transform server where a transform daemon is waiting to pick them up. The same process happens between transform & load. Once the files are moved they are typically archived on the source for a week, and the downstream process likewise moves them to temp then archive as it consumes them. So, my requirements & desires: It is never used to refresh updated files - only used to deliver new files. Because it's delivering files to downstream processes - it needs to rename the file once done so that a partial file doesn't get picked up. In order to simplify recovery, it should keep a copy of the source files - but rename them or move them to another directory. If the transfer fails (network down, file system full, permissions, file locked, etc), then it should retry periodically - and never fail in a non-recoverable way, or a way that sends the file twice or never sends the file. Should be able to copy files to 2+ destinations. Should have a consolidated log so that it's easy to find problems Should have an optional checksum feature Any recommendations? Can Unison do this well?

    Read the article

  • Setup automated calling during an ETL failure

    - by Ryan M
    Just started working on a large data warehouse project that runs some large ETLs every night. In the event of an error I receive an email, but I was hoping to somehow create something that will automatically call me, so I don't have to wake up and check my email at 4 every morning to make sure the ETLs finished properly. I know I can setup an SMS pretty easily, but I don't think that will be enough to wake me up :) Anyone have any experience trying to do this before?

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11  | Next Page >