Search Results

Search found 67143 results on 2686 pages for 'complex data types'.

Page 7/2686 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14  | Next Page >

  • New Feature in ODI 11.1.1.6: ODI for Big Data

    - by Julien Testut
    Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} By Ananth Tirupattur Starting with Oracle Data Integrator 11.1.1.6.0, ODI is offering a solution to process Big Data. This post provides an overview of this feature. With all the buzz around Big Data and before getting into the details of ODI for Big Data, I will provide a brief introduction to Big Data and Oracle Solution for Big Data. So, what is Big Data? Big data includes: structured data (this includes data from relation data stores, xml data stores), semi-structured data (this includes data from weblogs) unstructured data (this includes data from text blob, images) Traditionally, business decisions are based on the information gathered from transactional data. For example, transactional Data from CRM applications is fed to a decision system for analysis and decision making. Products such as ODI play a key role in enabling decision systems. However, with the emergence of massive amounts of semi-structured and unstructured data it is important for decision system to include them in the analysis to achieve better decision making capability. While there is an abundance of opportunities for business for gaining competitive advantages, process of Big Data has challenges. The challenges of processing Big Data include: Volume of data Velocity of data - The high Rate at which data is generated Variety of data In order to address these challenges and convert them into opportunities, we would need an appropriate framework, platform and the right set of tools. Hadoop is an open source framework which is highly scalable, fault tolerant system, for storage and processing large amounts of data. Hadoop provides 2 key services, distributed and reliable storage called Hadoop Distributed File System or HDFS and a framework for parallel data processing called Map-Reduce. Innovations in Hadoop and its related technology continue to rapidly evolve, hence therefore, it is highly recommended to follow information on the web to keep up with latest information. Oracle's vision is to provide a comprehensive solution to address the challenges faced by Big Data. Oracle is providing the necessary Hardware, software and tools for processing Big Data Oracle solution includes: Big Data Appliance Oracle NoSQL Database Cloudera distribution for Hadoop Oracle R Enterprise- R is a statistical package which is very popular among data scientists. ODI solution for Big Data Oracle Loader for Hadoop for loading data from Hadoop to Oracle. Further details can be found here: http://www.oracle.com/us/products/database/big-data-appliance/overview/index.html ODI Solution for Big Data: ODI’s goal is to minimize the need to understand the complexity of Hadoop framework and simplify the adoption of processing Big Data seamlessly in an enterprise. ODI is providing the capabilities for an integrated architecture for processing Big Data. This includes capability to load data in to Hadoop, process data in Hadoop and load data from Hadoop into Oracle. ODI is expanding its support for Big Data by providing the following out of the box Knowledge Modules (KMs). IKM File to Hive (LOAD DATA).Load unstructured data from File (Local file system or HDFS ) into Hive IKM Hive Control AppendTransform and validate structured data on Hive IKM Hive TransformTransform unstructured data on Hive IKM File/Hive to Oracle (OLH)Load processed data in Hive to Oracle RKM HiveReverse engineer Hive tables to generate models Using the Loading KM you can map files (local and HDFS files) to the corresponding Hive tables. For example, you can map weblog files categorized by date into a corresponding partitioned Hive table schema. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Using the Hive control Append KM you can validate and transform data in Hive. In the below example, two source Hive tables are joined and mapped to a target Hive table. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} The Hive Transform KM facilitates processing of semi-structured data in Hive. In the below example, the data from weblog is processed using a Perl script and mapped to target Hive table. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Using the Oracle Loader for Hadoop (OLH) KM you can load data from Hive table or HDFS to a corresponding table in Oracle. OLH is available as a standalone product. ODI greatly enhances OLH capability by generating the configuration and mapping files for OLH based on the configuration provided in the interface and KM options. ODI seamlessly invokes OLH when executing the scenario. In the below example, a HDFS file is mapped to a table in Oracle. Development and Deployment:The following diagram illustrates the development and deployment of ODI solution for Big Data. Using the ODI Studio on your development machine create and develop ODI solution for processing Big Data by connecting to a MySQL DB or Oracle database on a BDA machine or Hadoop cluster. Schedule the ODI scenarios to be executed on the ODI agent deployed on the BDA machine or Hadoop cluster. ODI Solution for Big Data provides several exciting new capabilities to facilitate the adoption of Big Data in an enterprise. You can find more information about the Oracle Big Data connectors on OTN. You can find an overview of all the new features introduced in ODI 11.1.1.6 in the following document: ODI 11.1.1.6 New Features Overview

    Read the article

  • Data recovery on a data HDD (no OS)

    - by aCuria
    I am helping a family member with a dead hard disk. It is a seagate 200Gb 3.5" HDD in one of those old-school external enclosures. The problem was that windows failed to detect the hard disk when plugged in through USB. I removed the hard disk from its enclosure, and plugged it into my desktop PC. The BIOS does detect it upon POST, but unfortunately windows 7 would refuse to boot. It will get stuck on the loading screen with the glowing windows logo. Safe mode doesn't help either. What options do I have before going for some professional data recovery? edit: Someone modified the Title to something completely different from what I was asking, i just changed it back. 1) 2 HDD drives, DiskA(Dead), DiskB(my OS disk) 2) when B is connected to my system, everything works fine 3) when A AND B is connected, failure to boot. POSTs fine, but windows wont load 4) A has NO OS, its PURE data. It came from an EXTERNAL HDD enclosure which doesnt belong to me, and im trying to do data recovery.

    Read the article

  • SOA Suite 11g Native Format Builder Complex Format Example

    - by bob.webster
    This rather long posting details the steps required to process a grouping of fixed length records using Format Builder.   If it’s 10 pm and you’re feeling beat you might want to leave this until tomorrow.  But if it’s 10 pm and you need to get a Format Builder Complex template done, read on… The goal is to process individual orders from a file using the 11g File Adapter and Format Builder Sample Data =========== 001Square Widget            0245.98 102Triagular Widget         1120.00 403Circular Widget           0099.45 ORD8898302/01/2011 301Hexagon Widget         1150.98 ORD6735502/01/2011 The records are fixed length records representing a number of logical Order records. Each order record consists of a number of item records starting with a 3 digit number, followed by a single Summary Record which starts with the constant ORD. How can this file be processed so that the first poll returns the first order? 001Square Widget            0245.98 102Triagular Widget         1120.00 403Circular Widget           0099.45 ORD8898302/01/2011 And the second poll returns the second order? 301Hexagon Widget           1150.98 ORD6735502/01/2011 Note: if you need more than one order per poll, that’s also possible, see the “Multiple Messages” field in the “File Adapter Step 6 of 9” snapshot further down.   To follow along with this example you will need - Studio Edition Version 11.1.1.4.0    with the   - SOA Extension for JDeveloper 11.1.1.4.0 installed Both can be downloaded from here:  http://www.oracle.com/technetwork/middleware/soasuite/downloads/index.html You will not need a running WebLogic Server domain to complete the steps and Format Builder tests in this article.     Start with a SOA Composite containing a File Adapter The Format Builder is part of the File Adapter so start by creating a new SOA Project and Composite. Here is a quick summary for those not familiar with these steps - Start JDeveloper - From the Main Menu choose File->New - In the New Gallery window that opens Expand the “General” category and Select the Applications node.   Then choose SOA Application from the Items section on the right.  Finally press the OK button. - In Step 1 of the “Create SOA Application wizard” that appears enter an Application Name and an Directory of your     choice,   then press the Next button. - In Step 2 of the “Create SOA Application wizard”, press the Next button leaving all entries as defaulted. - In Step 3 of the “Create SOA Application wizard”, Enter a composite name of your choice and Press the Finish   Button These steps result in a new Application and SOA Project. The SOA Project contains a composite.xml file which is opened and shown below. For our example we have not defined a Mediator or a BPEL process to minimize the steps, but one or the other would eventually be needed to use the File Adapter we are about to create. Drag and drop the File Adapter icon from the Component Pallette onto either the LEFT side of the diagram under “Exposed Services” or the right side under “External References”.  (See the Green Circle in the image below).  Placing the adapter on the left side would indicate the file being processed is inbound to the composite, if the adapter is placed on the right side then the data is outbound to a file.     Note that the same Format Builder definition can be used in both directions.  For example we could use the format with a File Adapter on the left side of the composite to parse fixed data into XML, modify the data in our Composite or BPEL process and then use the same Format Builder definition with a File adapter on the right side of the composite to write the data back out in the same fixed data format When the File Adapter is dropped on the Composite the File Adapter Wizard Appears. Skip Past the first page, Step 1 of 9 by pressing the Next button. In Step 2 enter a service name of your choice as shown below, then press Next   When the Native Format Builder appears, skip the welcome page by pressing next. Also press the Next button to accept the settings on Step 3 of 9 On Step 4, select Read File and press the Next button as shown below.   On Step 5 enter a directory that will contain a file with the input data, then  Press the Next button as shown below. In step 6, enter *.txt or another file format to select input files from the input directory mentioned in step 5. ALSO check the “Files contain Multiple Messages” checkbox and set the “Publish Messages in Batches of” field to 1.  The value can be set higher to increase the number of logical order group records returned on each poll of the file adapter.  In other words, it determines the number of Orders that will be sent to each instance of a Mediator or Composite processing using the File Adapter.   Skip Step 7 by pressing the Next button In Step 8 press the Gear Icon on the right side to load the Native Format Builder.       Native Format Builder  appears Before diving into the format, here is an overview of the process. Approach - Bottom up Assuming an Order is a grouping of item records and a summary record…. - Define a separate  Complex Type for each Record Type found in the group.    (One for itemRecord and one for summaryRecord) - Define a Complex Type to contain the Group of Record types defined above   (LogicalOrderRecord) - Define a top level element to represent an order.  (order)   The order element will be of type LogicalOrderRecord   Defining the Format In Step 1 select   “Create new”  and  “Complex Type” and “Next”   In Step two browse to and select a file containing the test data shown at the start of this article. A link is provided at the end of this article to download a file containing the test data. Press the Next button     In Step 3 Complex types must be define for each type of input record. Select the Root-Element and Click on the Add Complex Type icon This creates a new empty complex type definition shown below. The fastest way to create the definition is to highlight the first line of the Sample File data and drag the line onto the  <new_complex_type> Format Builder introspects the data and provides a grid to define additional fields. Change the “Complex Type Name” to  “itemRecord” Then click on the ruler to indicate the position of fixed columns.  Drag the red triangle icons to the exact columns if necessary. Double click on an existing red triangle to remove an unwanted entry. In the case below fields are define in columns 0-3, 4-28, 29-eol When the field definitions are correct, press the “Generate Fields” button. Field entries named C1, C2 and C3 will be created as shown below. Click on the field names and rename them from C1->itemNum, C2->itemDesc and C3->itemCost  When all the fields are correctly defined press OK to save the complex type.        Next, the process is repeated to define a Complex Type for the SummaryRecord. Select the Root-Element in the schema tree and press the new complex type icon Then highlight and drag the Summary Record from the sample data onto the <new_complex_type>   Change the complex type name to “summaryRecord” Mark the fixed fields for Order Number and Order Date. Press the Generate Fields button and rename C1 and C2 to itemNum and orderDate respectively.   The last complex type to be defined is a type to hold the group of items and the summary record. Select the Root-Element in the schema tree and click the new complex type icon Select the “<new_complex_type>” entry and click the pencil icon   On the Complex Type Details page change the name and type of each input field. Change line 1 to be named item and set the Type  to “itemRecord” Change line 2 to be named summary and set the Type to “summaryRecord” We also need to indicate that itemRecords repeat in the input file. Click the pencil icon at the right side of the item line. On the Edit Details page change the “Max Occurs” entry from 1 to UNBOUNDED. We also need to indicate how to identify an itemRecord.  Since each item record has “.” in column 32 we can use this fact to differentiate an item record from a summary record. Change the “Look Ahead” field to value 32 and enter a period in the “Look For” field Press the OK button to save entry.     Finally, its time to create a top level element to represent an order. Select the “Root-Element” in the schema tree and press the New element icon Click on the <new_element> and press the pencil icon.   Set the Element Name to “order” and change the Data Type to “logicalOrderRecord” Press the OK button to save the element definition.   The final definition should match the screenshot below. Press the Next Button to view the definition source.     Press the Test Button to test the definition   Press the Green Triangle Icon to run the test.   And we are presented with an unwelcome error. The error states that the processor ran out of data while working through the definition. The processor was unable to differentiate between itemRecords and summaryRecords and therefore treated the entire file as a list of itemRecords.  At end of file, the “summary” portion of the logicalOrderRecord remained unprocessed but mandatory.   This root cause of this error is the loss of our “lookAhead” definition used to identify itemRecords. This appears to be a bug in the  Native Format Builder 11.1.1.4.0 Luckily, a simple workaround exists. Press the Cancel button and return to the “Step 4 of 4” Window. Manually add    nxsd:lookAhead="32" nxsd:lookFor="."   attributes after the maxOccurs attribute of the item element. as shown in the highlighted text below.   When the lookAhead and lookFor attributes have been added Press the Test button and on the Test page press the Green Triangle. The test is now successful, the first order in the file is returned by the File Adapter.     Below is a complete listing of the Result XML from the right column of the screen above   Try running it The downloaded input test file and completed schema file can be used for testing without following all the Native Format Builder steps in this example. Use the following link to download a file containing the sample data. Download Sample Input Data This is the best approach rather than cutting and pasting the input data at the top of the article.  Since the data is fixed length it’s very important to watch out for trailing spaces in the data and to ensure an eol character at the end of every line. The download file is correctly formatted. The final schema definition can be downloaded at the following link Download Completed Schema Definition   - Save the inputData.txt file to a known location like the xsd folder in your project. - Save the inputData_6.xsd file to the xsd folder in your project. - At step 1 in the Native Format Builder wizard  (as shown above) check the “Edit existing” radio button,    then browse and select the inputData_6.xsd file - At step 2 of the Format Builder configuration Wizard (as shown above) supply the path and filename for    the inputData.txt file. - You can then proceed to the test page and run a test. - Remember the wizard bug will drop the lookAhead and lookFor attributes,  you will need to manually add   nxsd:lookAhead="32" nxsd:lookFor="."    after the maxOccurs attribute of the item element in the   LogicalOrderRecord Complex Type.  (as shown above)   Good Luck with your Format Project

    Read the article

  • Encode complex number as RGB pixel and back

    - by Vi
    How is it better to encode a complex number into RGB pixel and vice versa? Probably (logarithm of) an absolute value goes to brightness and an argument goes to hue. Desaturated pixes should receive randomized argument in reverse transformation. Something like: 0 - (0,0,0) 1 - (255,0,0) -1 - (0,255,255) 0.5 - (128,0,0) i - (255,255,0) -i - (255,0,255) (0,0,0) - 0 (255,255,255) - e^(i * random) (128,128,128) - 0.5 * e^(i *random) (0,128,128) - -0.5 Are there ready-made formulas for that? Edit: Looks like I just need to convert RGB to HSB and back. Edit 2: Existing RGB - HSV converter fragment: if (hsv.sat == 0) { hsv.hue = 0; // ! return hsv; } I don't want 0. I want random. And not just if hsv.sat==0, but if it is lower that it should be ("should be" means maximum saturation, saturation that is after transformation from complex number).

    Read the article

  • Multiplying complex with constant in C++

    - by Atilla Filiz
    The following code fails to compile #include <iostream> #include <cmath> #include <complex> using namespace std; int main(void) { const double b=3; complex <double> i(0, 1), comp; comp = b*i; comp=3*i; return 0; } with error: no match for ‘operator*’ in ‘3 * i’ What is wrong here, why cannot I multiply with immediate constants?

    Read the article

  • 2D grid with multiple types of objects

    - by Alexandre P. Levasseur
    This is my first post here in programmers.stackexchange (I'm a regular on SO). I hope this isn't too general. I'm trying a simple project to learn Java from something I've seen done in the past. Basically, it's an AI simulation where there are herbivorous and carnivorous creatures and both must try to survive. The part I am trying to come up with is that of the board itself. Let's assume very simple rules. The board must be of size X by Y and only one element can be in one place at one time. For example, a critter cannot be in the same tile as a food block. There can be obstacles (rocks, trees..), there can be food, there can be critters of any type. Assuming these rules, what would be one good way to represent this situation ? This is what I came up with and want suggestions if possible: Use multiple levels of inheritance to represent all the different possible objects (AbstractObject - (NonMovingObject - (Food, Obstacle) , MovingObject - Critter - (Carnivorous, Herbivorous))) and use polymorphism in a 2D array to store the instances and still have access to lower level methods. Many thanks. Edit: Here is the graphic representation of the structure I have in mind.

    Read the article

  • Data Governance 2010 Conference in San Diego

    - by Tony Ouk
    The Data Governance Annual Conference is one of the world's most authoritative and vendor neutral event on Data Governance and Data Quality.  The conference will focus on the "how-tos" from starting a data governance and stewardship program to attaining data governance maturity with specific topics on MDM.  This year's event will be hosted June 7 through June 10 in San Diego, California. For more information, including registration details, visit the Data Governance 2010 Conference website.

    Read the article

  • How to search for newline or linebreak characters in Excel?

    - by Highly Irregular
    I've imported some data into Excel (from a text file) and it contains some sort of newline characters. It looks like this initially: If I hit F2 (to edit) then Enter (to save changes) on each of the cells with a newline (without actually editing anything), Excel automatically changes the layout to look like this: I don't want these newlines characters here, as it messes up data processing further down the track. How can I do a search for these to detect more of them? The usual search function doesn't accept an enter character as a search character.

    Read the article

  • Oracle Data Integration 12c: Simplified, Future-Ready, High-Performance Solutions

    - by Thanos Terentes Printzios
    In today’s data-driven business environment, organizations need to cost-effectively manage the ever-growing streams of information originating both inside and outside the firewall and address emerging deployment styles like cloud, big data analytics, and real-time replication. Oracle Data Integration delivers pervasive and continuous access to timely and trusted data across heterogeneous systems. Oracle is enhancing its data integration offering announcing the general availability of 12c release for the key data integration products: Oracle Data Integrator 12c and Oracle GoldenGate 12c, delivering Simplified and High-Performance Solutions for Cloud, Big Data Analytics, and Real-Time Replication. The new release delivers extreme performance, increase IT productivity, and simplify deployment, while helping IT organizations to keep pace with new data-oriented technology trends including cloud computing, big data analytics, real-time business intelligence. With the 12c release Oracle becomes the new leader in the data integration and replication technologies as no other vendor offers such a complete set of data integration capabilities for pervasive, continuous access to trusted data across Oracle platforms as well as third-party systems and applications. Oracle Data Integration 12c release addresses data-driven organizations’ critical and evolving data integration requirements under 3 key themes: Future-Ready Solutions : Supporting Current and Emerging Initiatives Extreme Performance : Even higher performance than ever before Fast Time-to-Value : Higher IT Productivity and Simplified Solutions  With the new capabilities in Oracle Data Integrator 12c, customers can benefit from: Superior developer productivity, ease of use, and rapid time-to-market with the new flow-based mapping model, reusable mappings, and step-by-step debugger. Increased performance when executing data integration processes due to improved parallelism. Improved productivity and monitoring via tighter integration with Oracle GoldenGate 12c and Oracle Enterprise Manager 12c. Improved interoperability with Oracle Warehouse Builder which enables faster and easier migration to Oracle Data Integrator’s strategic data integration offering. Faster implementation of business analytics through Oracle Data Integrator pre-integrated with Oracle BI Applications’ latest release. Oracle Data Integrator also integrates simply and easily with Oracle Business Analytics tools, including OBI-EE and Oracle Hyperion. Support for loading and transforming big and fast data, enabled by integration with big data technologies: Hadoop, Hive, HDFS, and Oracle Big Data Appliance. Only Oracle GoldenGate provides the best-of-breed real-time replication of data in heterogeneous data environments. With the new capabilities in Oracle GoldenGate 12c, customers can benefit from: Simplified setup and management of Oracle GoldenGate 12c when using multiple database delivery processes via a new Coordinated Delivery feature for non-Oracle databases. Expanded heterogeneity through added support for the latest versions of major databases such as Sybase ASE v 15.7, MySQL NDB Clusters 7.2, and MySQL 5.6., as well as integration with Oracle Coherence. Enhanced high availability and data protection via integration with Oracle Data Guard and Fast-Start Failover integration. Enhanced security for credentials and encryption keys using Oracle Wallet. Real-time replication for databases hosted on public cloud environments supported by third-party clouds. Tight integration between Oracle Data Integrator 12c and Oracle GoldenGate 12c and other Oracle technologies, such as Oracle Database 12c and Oracle Applications, provides a number of benefits for organizations: Tight integration between Oracle Data Integrator 12c and Oracle GoldenGate 12c enables developers to leverage Oracle GoldenGate’s low overhead, real-time change data capture completely within the Oracle Data Integrator Studio without additional training. Integration with Oracle Database 12c provides a strong foundation for seamless private cloud deployments. Delivers real-time data for reporting, zero downtime migration, and improved performance and availability for Oracle Applications, such as Oracle E-Business Suite and ATG Web Commerce . Oracle’s data integration offering is optimized for Oracle Engineered Systems and is an integral part of Oracle’s fast data, real-time analytics strategy on Oracle Exadata Database Machine and Oracle Exalytics In-Memory Machine. Oracle Data Integrator 12c and Oracle GoldenGate 12c differentiate the new offering on data integration with these many new features. This is just a quick glimpse into Oracle Data Integrator 12c and Oracle GoldenGate 12c. Find out much more about the new release in the video webcast "Introducing 12c for Oracle Data Integration", where customer and partner speakers, including SolarWorld, BT, Rittman Mead will join us in launching the new release. Resource Kits Meet Oracle Data Integration 12c  Discover what's new with Oracle Goldengate 12c  Oracle EMEA DIS (Data Integration Solutions) Partner Community is available for all your questions, while additional partner focused webcasts will be made available through our blog here, so stay connected. For any questions please contact us at partner.imc-AT-beehiveonline.oracle-DOT-com Stay Connected Oracle Newsletters

    Read the article

  • Building a Data Mart with Pentaho Data Integration Video Review by Diethard Steiner, Packt Publishing

    - by Compudicted
    Originally posted on: http://geekswithblogs.net/Compudicted/archive/2014/06/01/building-a-data-mart-with-pentaho-data-integration-video-review.aspx The Building a Data Mart with Pentaho Data Integration Video by Diethard Steiner from Packt Publishing is more than just a course on how to use Pentaho Data Integration, it also implements and uses the principals of the Data Warehousing (and I even heard the name of Ralph Kimball in the video). Indeed, a video watcher should be familiar with its concepts as the Star Schema, Slowly Changing Dimension types, etc. so I suggest prior to watching this course to consider skimming through the Data Warehouse concepts (if unfamiliar) or even better, read the excellent Ralph’s The Data Warehouse Tooolkit. By the way, the author expands beyond using Pentaho along to MySQL and MonetDB which is a real icing on the cake! Indeed, I even suggest the name of the course should be ‘Building a Data Warehouse with Pentaho’. To successfully complete the course one needs to know some Linux (Ubuntu used in the course), the VI editor and the Bash command shell, but it seems that similar requirements would also apply to the Weindows OS. Additionally, knowing some basic SQL would not hurt. As I had said, MonetDB is used in this course several times which seems to be not anymore complex than say MySQL, but based on what I read is very well suited for fast querying big volumes of data thanks to having a columnstore (vertical data storage). I don’t see what else can be a barrier, the material is very digestible. On this note, I must add that the author does not cover how to acquire the software, so here is what I found may help: Pentaho: the free Community Edition must be more than anyone needs to learn it. Or even go into a POC. MonetDB can be downloaded (exists for both, Linux and Windows) from http://goo.gl/FYxMy0 (just see the appropriate link on the left). The author seems to be using Eclipse to run SQL code, one can get it from http://goo.gl/5CcuN. To create, or edit database entities and/or schema otherwise one can use a universal tool called SQuirreL, get it from http://squirrel-sql.sourceforge.net.   Next, I must confess Diethard is very knowledgeable in what he does and beyond. However, there will be some accent heard to the user of the course especially if one’s mother tongue language is English, but it I got over it in a few chapters. I liked the rate at which the material is being presented, it makes me feel I paid for every second Eventually, my impressions are: Pentaho is an awesome ETL offering, it is worth learning it very much (I am an ETL fan and a heavy user of SSIS) MonetDB is nice, it tickles my fancy to know it more Data Warehousing, despite all the BigData tool offerings (Hive, Scoop, Pig on Hadoop), using the traditional tools still rocks Chapters 2 to 6 were the most fun to me with chapter 8 being the most difficult.   In terms of closing, I highly recommend this video to anyone who needs to grasp Pentaho concepts quick, likewise, the course is very well suited for any developer on a “supposed to be done yesterday” type of a project. It is for a beginner to intermediate level ETL/DW developer. But one would need to learn more on Data Warehousing and Pentaho, for such I recommend the 5 star Pentaho Data Integration 4 Cookbook. Enjoy it! Disclaimer: I received this video from the publisher for the purpose of a public review.

    Read the article

  • Building a Data Mart with Pentaho Data Integration Video Review by Diethard Steiner, Packt Publishing

    - by Compudicted
    Originally posted on: http://geekswithblogs.net/Compudicted/archive/2014/06/01/building-a-data-mart-with-pentaho-data-integration-video-review-again.aspx The Building a Data Mart with Pentaho Data Integration Video by Diethard Steiner from Packt Publishing is more than just a course on how to use Pentaho Data Integration, it also implements and uses the principals of the Data Warehousing (and I even heard the name of Ralph Kimball in the video). Indeed, a video watcher should be familiar with its concepts as the Star Schema, Slowly Changing Dimension types, etc. so I suggest prior to watching this course to consider skimming through the Data Warehouse concepts (if unfamiliar) or even better, read the excellent Ralph’s The Data Warehouse Tooolkit. By the way, the author expands beyond using Pentaho along to MySQL and MonetDB which is a real icing on the cake! Indeed, I even suggest the name of the course should be ‘Building a Data Warehouse with Pentaho’. To successfully complete the course one needs to know some Linux (Ubuntu used in the course), the VI editor and the Bash command shell, but it seems that similar requirements would also apply to the Windows OS. Additionally, knowing some basic SQL would not hurt. As I had said, MonetDB is used in this course several times which seems to be not anymore complex than say MySQL, but based on what I read is very well suited for fast querying big volumes of data thanks to having a columnstore (vertical data storage). I don’t see what else can be a barrier, the material is very digestible. On this note, I must add that the author does not cover how to acquire the software, so here is what I found may help: Pentaho: the free Community Edition must be more than anyone needs to learn it. Or even go into a POC. MonetDB can be downloaded (exists for both, Linux and Windows) from http://goo.gl/FYxMy0 (just see the appropriate link on the left). The author seems to be using Eclipse to run SQL code, one can get it from http://goo.gl/5CcuN. To create, or edit database entities and/or schema otherwise one can use a universal tool called SQuirreL, get it from http://squirrel-sql.sourceforge.net.   Next, I must confess Diethard is very knowledgeable in what he does and beyond. However, there will be some accent heard to the user of the course especially if one’s mother tongue language is English, but it I got over it in a few chapters. I liked the rate at which the material is being presented, it makes me feel I paid for every second Eventually, my impressions are: Pentaho is an awesome ETL offering, it is worth learning it very much (I am an ETL fan and a heavy user of SSIS) MonetDB is nice, it tickles my fancy to know it more Data Warehousing, despite all the BigData tool offerings (Hive, Scoop, Pig on Hadoop), using the traditional tools still rocks Chapters 2 to 6 were the most fun to me with chapter 8 being the most difficult.   In terms of closing, I highly recommend this video to anyone who needs to grasp Pentaho concepts quick, likewise, the course is very well suited for any developer on a “supposed to be done yesterday” type of a project. It is for a beginner to intermediate level ETL/DW developer. But one would need to learn more on Data Warehousing and Pentaho, for such I recommend the 5 star Pentaho Data Integration 4 Cookbook. Enjoy it! Disclaimer: I received this video from the publisher for the purpose of a public review.

    Read the article

  • Internal Mutation of Persistent Data Structures

    - by Greg Ros
    To clarify, when I mean use the terms persistent and immutable on a data structure, I mean that: The state of the data structure remains unchanged for its lifetime. It always holds the same data, and the same operations always produce the same results. The data structure allows Add, Remove, and similar methods that return new objects of its kind, modified as instructed, that may or may not share some of the data of the original object. However, while a data structure may seem to the user as persistent, it may do other things under the hood. To be sure, all data structures are, internally, at least somewhere, based on mutable storage. If I were to base a persistent vector on an array, and copy it whenever Add is invoked, it would still be persistent, as long as I modify only locally created arrays. However, sometimes, you can greatly increase performance by mutating a data structure under the hood. In more, say, insidious, dangerous, and destructive ways. Ways that might leave the abstraction untouched, not letting the user know anything has changed about the data structure, but being critical in the implementation level. For example, let's say that we have a class called ArrayVector implemented using an array. Whenever you invoke Add, you get a ArrayVector build on top of a newly allocated array that has an additional item. A sequence of such updates will involve n array copies and allocations. Here is an illustration: However, let's say we implement a lazy mechanism that stores all sorts of updates -- such as Add, Set, and others in a queue. In this case, each update requires constant time (adding an item to a queue), and no array allocation is involved. When a user tries to get an item in the array, all the queued modifications are applied under the hood, requiring a single array allocation and copy (since we know exactly what data the final array will hold, and how big it will be). Future get operations will be performed on an empty cache, so they will take a single operation. But in order to implement this, we need to 'switch' or mutate the internal array to the new one, and empty the cache -- a very dangerous action. However, considering that in many circumstances (most updates are going to occur in sequence, after all), this can save a lot of time and memory, it might be worth it -- you will need to ensure exclusive access to the internal state, of course. This isn't a question about the efficacy of such a data structure. It's a more general question. Is it ever acceptable to mutate the internal state of a supposedly persistent or immutable object in destructive and dangerous ways? Does performance justify it? Would you still be able to call it immutable? Oh, and could you implement this sort of laziness without mutating the data structure in the specified fashion?

    Read the article

  • Sabre Manages Fast Data Growth with Oracle Data Integration Products

    - by Irem Radzik
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} Last year at OpenWorld we announced Sabre Holding as a winner of the Fusion Middleware Innovation Awards. The Sabre team did an excellent job at leveraging cutting edge technologies for managing rapid data growth and exponential scalability demands they have experienced in the travel industry. Today we announced the details and specific benefits of Sabre’s new real-time data integration solution in a press release. Please take a look if you haven’t seen it yet. Sabre Holdings Deploys Oracle Data Integrator and Oracle GoldenGate to Support Rapid Customer Growth There are 3 different areas of benefits Sabre achieved by using Oracle Data Integration products: Manages 7X increase in data sources for the enterprise data warehouse Reduced infrastructure complexity Decreased time to market for new products and services by 30 percent. This simply shows that using latest technologies helps the companies to innovate robust solutions against today’s key data management challenges. And the benefit of using a next generation data integration technology is not only seen in the IT operations, but also in the business side. A better data integration solution for the enterprise data warehouse delivered the platform they need to accelerate how they service their customers, improving their competitive advantage. Tomorrow I will give another great example of innovation with next generation data integration from Oracle. We will be discussing the Fusion Middleware Innovation Awards 2012 winners and their results with using Oracle’s data integration products.

    Read the article

  • implementing dynamic query handler on historical data

    - by user2390183
    EDIT : Refined question to focus on the core issue Context: I have historical data about property (house) sales collected from various sources in a centralized/cloud data source (assume info collection is handled by a third party) Planning to develop an application to query and retrieve data from this centralized data source Example Queries: Simple : for given XYZ post code, what is average house price for 3 bed room house? Complex: What is estimated price for an house at "DD,Some Street,XYZ Post Code" (worked out from average values of historic data filtered by various characteristics of the house: house post code, no of bed rooms, total area, and other deeper insights like house building type, year of built, features)? In addition to average price, the application should support other property info ** maximum, or minimum price..etc and trend (graph) on a selected property attribute over a period of time**. Hence, the queries should not enforce the search based on a primary key or few fixed fields In other words, queries can be What is the change in 3 Bed Room house price (irrespective of location) over last 30 days? What kind of properties we can get for X price (irrespective of location or house type) The challenge I have is identifying the domain (BI/ Data Analytical or DB Design or DB Query Interface or DW related or something else) this problem (dynamic query on historic data) belong to, so that I can do further exploration My findings so far I could be wrong on the following, so please correct me if you think so I briefly read about BI/Data Analytics - I think it is heavy weight solution for my problem and has scalability issues. DB Design - As I understand RDBMS works well if you know Data model at design time. I am expecting attributes about property or other entity (user) that am going to bring in, would evolve quickly. hence maintenance would be an issue. As I am going to have multiple users executing query at same time, performance would be a bottleneck Other options like Graph DB (http://www.tinkerpop.com/) seems to be bit complex (they are good. but using those tools meant for generic purpose, make me think like assembly programming to solve my problem ) BigData related solution are to analyse data from multiple unrelated domains So, Any suggestion on the space this problem fit in ? (Especially if you have design/implementation experience of back-end for property listing or similar portals)

    Read the article

  • AngularJS dealing with large data sets (Strategy)

    - by Brian
    I am working on developing a personal temperature logging viewer based on my rasppi curl'ing data into my web server's api. Temperatures are taken every 2 seconds and I can have several temperature sensors posting data. Needless to say I will have a lot of data to handle even within the scope of an hour. I have implemented a very simple paging api from the server so the server doesn't timeout and is currently only returning data in 1000 units per call, then paging through the data. I had the idea to intially show say the last 20 minutes of data from a sensor (or all sensors depending on user choices), then allowing the user to select other timeframes from which to show data. The issue comes in when you want to view all sensors or an extended time period (say 24 hours). Is there a best practice of handling this large amount of data? Would it be useful to load those first 20 minutes into the live view and then cache into local storage something like the last 24 hours? I haven't been able to find a decent idea of this in use yet even though there are a lot of ways to take this problem. I am just looking for some suggestions as to what might provide a good balance between good performance and not caching the entire data set on the client side (as beyond a week of data this might not be feasible).

    Read the article

  • I need some help creating a non-binary tree (or some other data structure that will better solve my problem)

    - by EDO
    I have about ten lists of numbers and some strings. Each list has about <= 30K lines. Each line on a list has a distinct number. I need to build an efficient way of finding all the lines in each list that has the same 'control' number (or key for dB guys) and comparing what is in their string parts. I am writing this in Java. I have thought about using trees but my brain cells are about burnt now. I need some help.

    Read the article

  • replacing data.frame element-wise operations with data.table (that used rowname)

    - by Harold
    So lets say I have the following data.frames: df1 <- data.frame(y = 1:10, z = rnorm(10), row.names = letters[1:10]) df2 <- data.frame(y = c(rep(2, 5), rep(5, 5)), z = rnorm(10), row.names = letters[1:10]) And perhaps the "equivalent" data.tables: dt1 <- data.table(x = rownames(df1), df1, key = 'x') dt2 <- data.table(x = rownames(df2), df2, key = 'x') If I want to do element-wise operations between df1 and df2, they look something like dfRes <- df1 / df2 And rownames() is preserved: R> head(dfRes) y z a 0.5 3.1405463 b 1.0 1.2925200 c 1.5 1.4137930 d 2.0 -0.5532855 e 2.5 -0.0998303 f 1.2 -1.6236294 My poor understanding of data.table says the same operation should look like this: dtRes <- dt1[, !'x', with = F] / dt2[, !'x', with = F] dtRes[, x := dt1[,x,]] setkey(dtRes, x) (setkey optional) Is there a more data.table-esque way of doing this? As a slightly related aside, more generally, I would have other columns such as factors in each data.table and I would like to omit those columns while doing the element-wise operations, but still have them in the result. Does this make sense? Thanks!

    Read the article

  • PHP - post data ends when '&' is in data.

    - by Phil Jackson
    Hi all, im posting data using jquery/ajax and PHP at the backend. Problem being, when I input something like 'Jack & Jill went up the hill' im only recieving 'Jack' when it gets to the backend. I have thrown an error at the frontend before that data is sent which alerts 'Jack & Jill went up the hill'. When I put die(print_r($_POST)); at the very top of my index page im only getting [key] => Jack how can I be loosing the data? I thought It may have been my filter; <?php function filter( $data ) { $data = trim( htmlentities( strip_tags( mb_convert_encoding( $data, 'HTML-ENTITIES', "UTF-8") ) ) ); if ( get_magic_quotes_gpc() ) { $data = stripslashes( $data ); } //$data = mysql_real_escape_string( $data ); return $data; } echo "<xmp>" . filter("you & me") . "</xmp>"; ?> but that returns fine in the test above you &amp; me which is in place after I added die(print_r($_POST));. Can anyone think of how and why this is happening? Any help much appreciated. Regards, Phil.

    Read the article

  • SQL Developer Debugging, Watches, Smart Data, & Data

    - by thatjeffsmith
    After presenting the SQL Developer PL/SQL debugger for about an hour yesterday at KScope12 in San Antonio, my boss came up and asked, “Now, would you really want to know what the Smart Data panel does?” Apparently I had ‘made up’ my own story about what that panel’s intent is based on my experience with it. Not good Jeff, not good. It was a very small point of my presentation, but I probably should have read the docs. The Smart Data tab displays information about variables, using your Debugger: Smart Data preferences. You can also specify these preferences by right-clicking in the Smart Data window and selecting Preferences. Debugger Smart Data Preferences, control number of variables to display The Smart Data panel auto-inspects the last X accessed variables. So if you have a program with 26 variables, instead of showing you all 26, it will just show you the last two variables that were referenced in your program. If you were to click on the ‘Data’ debug panel, you’ll see EVERYTHING. And if you only want to see a very specific set of values, then you should use Watches. The Smart Data Panel As I step through the code, the variables being tracked change as they are referenced. Only the most recent ones display. This is controlled by the ‘Maximum Locations to Remember’ preference. Step through the code, see the latest variables accessed The Data Panel All variables are displayed. Might be information overload on large PL/SQL programs where you have many dozens or even hundreds of variables to track. Shows everything all the time Watches Watches are added manually and only show what you ask for. Data on Demand – add a watch to track a specific variable Remember, you can interact with your data If you want to do more than just watch, you can mouse-right on a data element, and change the value of the variable as the program is running. This is one of the primary benefits to debugging over using DBMS_OUTPUT to track what’s happening in your program. Change the values while the program is running to test your ‘What if?’ scenarios

    Read the article

  • SQL – Biggest Concerns in a Data-Driven World

    - by Pinal Dave
    The ongoing chaos over Government Agency’s snooping has ignited a heated debate on privacy of personal data and its use by government and/or other institutions. It has created a feeling of disapproval and distrust among users. This incident proves to be a lesson for companies that are looking to leverage their business using a data driven approach. According to analysts, the goal of gathering personal information should be to deliver benefits to both the parties – the user as well as the data collector(government or business). Using data the right way is crucial, and companies need to deploy the right software applications and systems to ensure that their efforts are well-directed. However, there are various issues plaguing analysts regarding available software, which are highlighted below. According to a InformationWeek 2013 Survey of Analytics, Business Intelligence and Information Management where 541 business technology professionals contributed as respondents, it was discovered that the biggest concern was deemed to be the scarcity of expertise and high costs associated with the same. This concern was voiced by as many as 38% of the participants. A close second came out to be the issue of data warehouse appliance platforms being expensive, with 33% of those present believing it to be a huge roadblock. Another revelation made in this respect was that 31% professionals weren’t even sure how Data Analytics can create business opportunities for them. Another 17% shared that they found data platform technologies such as Hadoop and NoSQL technologies hard to learn. These results clearly pointed out that there are awareness and expertise issues that also need much attention. Unless the demand-supply gap of Business Intelligence professionals well versed in data analysis technologies is met, this divide is going to affect how companies make the most of their BI campaigns. One of the key action points that can be taken to salvage the situation, is to provide training on Data Analytics concepts. Koenig Solutions offer courses on many such technologies including a course on MCSE SQL Server 2012: BI Platform. So it’s time to brush up your skills and get down to work in a data driven world that awaits you ahead. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Store XML data in Core Data

    - by ct2k7
    Hi, is there any easy way of store XML data into core data? Currently, my app just pulls the values from the XML file directly, however, this isn't efficient for XML files which holds over 100 entries, thus storing the data in Core Data would be the best option. XML file is called/downloaded/parsed ever time the app opens. With the Core Data, the XML data would be downloaded ever 3600 seconds or so, and refresh the current data in the core data, to reduce the loading time when opening the app. Any ideas on how I can do this? Having reviewed the developer documentation, it doesn't look very tasty.

    Read the article

  • Where can I find free and open data?

    - by kitsune
    Sooner or later, coders will feel the need to have access to "open data" in one of their projects, from knowing a city's zip to a more obscure information such as the axial tilt of Pluto. I know data.un.org which offers access to the UN's extensive array of databases that deal with human development and other socio-economic issues. The other usual suspects are NASA and the USGS for planetary data. There's an article at readwriteweb with more links. infochimps.org seems to stand out. Personally, I need to find historic commodity prices, stock values and other financial data. All these data sets seem to cost money however. Clarification To clarify, I'm interested in all kinds of open data, because sooner or later, I know I will be in a situation where I could need it. I will try to edit this answer and include the suggestions in a structured manners. A link for financial data was hidden in that readwriteweb article, doh! It's called opentick.com. Looks good so far! Update I stumbled over semantic data in another question of mine on here. There is opencyc ('the world's largest and most complete general knowledge base and commonsense reasoning engine'). A project called UMBEL provides a light-weight, distilled version of opencyc. Umbel has semantic data in rdf/owl/skos n3 syntax. The Worldbank also released a very nice API. It offers data from the last 50 years for about 200 countries

    Read the article

  • Temporary storage for keeping data between program iterations?

    - by mr.b
    I am working on an application that works like this: It fetches data from many sources, resulting in pool of about 500,000-1,500,000 records (depends on time/day) Data is parsed Part of data is processed in a way to compare it to pre-existing data (read from database), calculations are made, and stored in database. Resulting dataset that has to be stored in database is, however, much smaller in size (compared to original data set), and ranges from 5,000-50,000 records. This process almost always updates existing data, perhaps adds few more records. Then, data from step 2 should be kept somehow, somewhere, so that next time data is fetched, there is a data set which can be used to perform calculations, without touching pre-existing data in database. I should point out that this data can be lost, it's not irreplaceable (key information can be read from database if needed), but it would speed up the process next time. Application components can (and will be) run off different computers (in the same network), so storage has to be reachable from multiple hosts. I have considered using memcached, but I'm not quite sure should I do so, because one record is usually no smaller than 200 bytes, and if I have 1,500,000 records, I guess that it would amount to over 300 MB of memcached cache... But that doesn't seem scalable to me - what if data was 5x that amount? If it were to consume 1-2 GB of cache only to keep data in between iterations (which could easily happen)? So, the question is: which temporary storage mechanism would be most suitable for this kind of processing? I haven't considered using mysql temporary tables, as I'm not sure if they can persist between sessions, and be used by other hosts in network... Any other suggestion? Something I should consider?

    Read the article

  • Why are data structures so important in interviews?

    - by Vamsi Emani
    I am a newbie into the corporate world recently graduated in computers. I am a java/groovy developer. I am a quick learner and I can learn new frameworks, APIs or even programming languages within considerably short amount of time. Albeit that, I must confess that I was not so strong in data structures when I graduated out of college. Through out the campus placements during my graduation, I've witnessed that most of the biggie tech companies like Amazon, Microsoft etc focused mainly on data structures. It appears as if data structures is the only thing that they expect from a graduate. Adding to this, I see that there is this general perspective that a good programmer is necessarily a one with good knowledge about data structures. To be honest, I felt bad about that. I write good code. I follow standard design patterns of coding, I do use data structures but at the superficial level as in java exposed APIs like ArrayLists, LinkedLists etc. But the companies usually focused on the intricate aspects of Data Structures like pointer based memory manipulation and time complexities. Probably because of my java-ish background, Back then, I understood code efficiency and logic only when talked in terms of Object Oriented Programming like Objects, instances, etc but I never drilled down into the level of bits and bytes. I did not want people to look down upon me for this knowledge deficit of mine in Data Structures. So really why all this emphasis on Data Structures? Does, Not having knowledge in Data Structures really effect one's career in programming? Or is the knowledge in this subject really a sufficient basis to differentiate a good and a bad programmer?

    Read the article

  • Data structure for pattern matching.

    - by alvonellos
    Let's say you have an input file with many entries like these: date, ticker, open, high, low, close, <and some other values> And you want to execute a pattern matching routine on the entries(rows) in that file, using a candlestick pattern, for example. (See, Doji) And that pattern can appear on any uniform time interval (let t = 1s, 5s, 10s, 1d, 7d, 2w, 2y, and so on...). Say a pattern matching routine can take an arbitrary number of rows to perform an analysis and contain an arbitrary number of subpatterns. In other words, some patterns may require 4 entries to operate on. Say also that the routine (may) later have to find and classify extrema (local and global maxima and minima as well as inflection points) for the ticker over a closed interval, for example, you could say that a cubic function (x^3) has the extrema on the interval [-1, 1]. (See link) What would be the most natural choice in terms of a data structure? What about an interface that conforms a Ticker object containing one row of data to a collection of Ticker so that an arbitrary pattern can be applied to the data. What's the first thing that comes to mind? I chose a doubly-linked circular linked list that has the following methods: push_front() push_back() pop_front() pop_back() [] //overloaded, can be used with negative parameters But that data structure seems very clumsy, since so much pushing and popping is going on, I have to make a deep copy of the data structure before running an analysis on it. So, I don't know if I made my question very clear -- but the main points are: What kind of data structures should be considered when analyzing sequential data points to conform to a pattern that does NOT require random access? What kind of data structures should be considered when classifying extrema of a set of data points?

    Read the article

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14  | Next Page >