Search Results

Search found 401 results on 17 pages for 'hadoop'.

Page 15/17 | < Previous Page | 11 12 13 14 15 16 17 | Next Page >

Tackling Big Data Analytics with Oracle Data Integrator

- by Irem Radzik

v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Times New Roman","serif"; mso-fareast-font-family:"Times New Roman";} By Mike Eisterer The term big data draws a lot of attention, but behind the hype there's a simple story. For decades, companies have been making business decisions based on transactional data stored in relational databases. Beyond that critical data, however, is a potential treasure trove of less structured data: weblogs, social media, email, sensors, and documents that can be mined for useful information. Companies are facing emerging technologies, increasing data volumes, numerous data varieties and the processing power needed to efficiently analyze data which changes with high velocity. Oracle offers the broadest and most integrated portfolio of products to help you acquire and organize these diverse data sources and analyze them alongside your existing data to find new insights and capitalize on hidden relationships Oracle Data Integrator Enterprise Edition(ODI) is critical to any enterprise big data strategy. ODI and the Oracle Data Connectors provide native access to Hadoop, leveraging such technologies as MapReduce, HDFS and Hive. Alongside with ODI’s metadata driven approach for extracting, loading and transforming data; companies may now integrate their existing data with big data technologies and deliver timely and trusted data to their analytic and decision support platforms. In this session, you’ll learn about ODI and Oracle Big Data Connectors and how, coupled together, they provide the critical integration with multiple big data platforms. Tackling Big Data Analytics with Oracle Data Integrator October 1, 2012 12:15 PM at MOSCONE WEST – 3005 For other data integration sessions at OpenWorld, please check our Focus-On document. If you are not able to attend OpenWorld, please check out our latest resources for Data Integration.

Read the article
SQL – Biggest Concerns in a Data-Driven World

- by Pinal Dave

The ongoing chaos over Government Agency’s snooping has ignited a heated debate on privacy of personal data and its use by government and/or other institutions. It has created a feeling of disapproval and distrust among users. This incident proves to be a lesson for companies that are looking to leverage their business using a data driven approach. According to analysts, the goal of gathering personal information should be to deliver benefits to both the parties – the user as well as the data collector(government or business). Using data the right way is crucial, and companies need to deploy the right software applications and systems to ensure that their efforts are well-directed. However, there are various issues plaguing analysts regarding available software, which are highlighted below. According to a InformationWeek 2013 Survey of Analytics, Business Intelligence and Information Management where 541 business technology professionals contributed as respondents, it was discovered that the biggest concern was deemed to be the scarcity of expertise and high costs associated with the same. This concern was voiced by as many as 38% of the participants. A close second came out to be the issue of data warehouse appliance platforms being expensive, with 33% of those present believing it to be a huge roadblock. Another revelation made in this respect was that 31% professionals weren’t even sure how Data Analytics can create business opportunities for them. Another 17% shared that they found data platform technologies such as Hadoop and NoSQL technologies hard to learn. These results clearly pointed out that there are awareness and expertise issues that also need much attention. Unless the demand-supply gap of Business Intelligence professionals well versed in data analysis technologies is met, this divide is going to affect how companies make the most of their BI campaigns. One of the key action points that can be taken to salvage the situation, is to provide training on Data Analytics concepts. Koenig Solutions offer courses on many such technologies including a course on MCSE SQL Server 2012: BI Platform. So it’s time to brush up your skills and get down to work in a data driven world that awaits you ahead. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Where can I hire local programmers with very specific skillsets?

- by Lostsoul

I have been browsing the site and haven't found a exact fit to this question so I'll post it but if its already answered(since I'm sure its a common problem, then let me know). I have a business and want to create a totally different product in a different industry than I'm currently in, so I learned how to program and created a working prototype. I have a bit of savings and am getting some cash flow from my current business so I can go out and hire a developer(in the future hopefully it can be permenant but right now I just need a person willing to work on contract and code on weekends or their spare time and I just want to pay in cash instead of equity or future promises). At first I wasn't sure what kind of developer to hire but this question helped me understand I should target specific skills I need as opposed to general programmers. This poses a problem for me since general programmers are everywhere but if I want specific skills I'm unsure how to get them. I thought about a list of approaches but it doesn't feel complete or effective since it seems to be assuming good developers are actively looking. If it helps I want someone local(since this is my first developer hire) and looking for skills like cuda, hadoop, hbase, java and c. Any suggestions? As a FYI, I have been thinking of approaching it as: Go to meet ups for one or more skills I need. Use LinkedIn to find people with the skills I need Search for job postings that contain skills I need and then use linkedIn to reach out to that firms employees since many profiles on linkedin are not very updated or detailed but job postings generally are. Send postings to universities and maybe find a student who loves technology so much they learned these tools on their own. Post on job board. Not sure how successful it will be to post to monster. Use Craigslist, not sure if a highly skilled developer would go here for work. What am I missing? I could be wrong but it seems like good/smart/able developers aren't hunting for work non-stop(especially in this tech job market). Plus most successful people I know have work/life balance so I'm not sure if the best ones really care about code after work. Lastly, most of the skills I need aren't used in big corporations so not sure how aggressively smart developers at small shops look for work. I don’t really know any developers personally, so but should I be using the above plan or if they live balanced lives should I be looking outside of the regular resources(and instead focus on asking around my gym or my accountant or something)? Sorry, I'm making huge assumptions here, I guess because developers are a total mystery to me. I kind of wish Jane Goodall wrote a book on understanding developers social behaviour better :-p

Read the article
PASS Summit for SQL Starters

- by Davide Mauri

I’ve received a buch of emails from PASS Summit “First Timers” that are also somehow new to SQL Server (for “somehow” I mean people with less than 6 month experience but with some basic knowledge of SQL Server engine) or are catching up from SQL Server 2000. The common question regards the session one should not miss to have a broad view of the entire SQL Server platform have some insight into some specific areas of SQL Server Given that I’m on (semi-)vacantion and that I have more free time (not true, I have to prepare slides & demos for several conferences, PASS Summit - Building the Agile Data Warehouse with SQL Server 2012 - and PASS 24H - Agile Data Warehousing with SQL Server 2012 - among them…but let’s pretend it to be true), I’ve decided to make a post to answer to this common questions. Of course this is my personal point of view and given the fact that the number and quality of session that will be delivered at PASS Summit is so high that is very difficoult to make a choice, fell free to jump into the discussion and leave your feedback or – even better – answer with another post. I’m sure it will be very helpful to all the SQL Server beginners out there. I’ve imposed to myself to choose 6 session at maximum for each Track. Why 6? Because it’s the maximum number of session you can follow in one day, and given that all the session will be on the Summit DVD, they are the answer to the following question: “If I have one day to spend in training, which session I should watch?”. Of course a Summit is not like a Course so a lot of very basics concept of well-established technologies won’t be found here. Analysis Services, Integration Services, MDX are not part of the Summit this time (at least for the basic part of them). Enough with that, let’s start with the session list ideal to have a good Overview of all the SQL Server Platform: Geospatial Data Types in SQL Server 2012 Inside Unstructured Data: SQL Server 2012 FileTable and Semantic Search XQuery and XML in SQL Server: Common Problems and Best Practice Solutions Microsoft's Big Play for Big Data Dashboards: When to Choose Which MSBI Tool Microsoft BI End-User Tools 360° for what concern Database Development, I recommend the following sessions Understanding Transaction Isolation Levels What to Look for in Execution Plans Improve Query Performance by Fixing Bad Parameter Sniffing A Window into Your Data: Using SQL Window Functions Practical Uses and Optimization of New T-SQL Features in SQL Server 2012 Taking MERGE Beyond the Basics For Business Intelligence Information Delivery Analyzing SSAS Data with Excel Building Compelling Power View Reports Managed Self-Service BI PowerPivot 101 SharePoint for Business Intelligence The Best Microsoft BI Tools You've Never Heard Of and for Business Intelligence Architecture & Development BI Power Hour Building a Tabular Model Database Enterprise Information Management: Bringing Together SSIS, DQS, and MDS SSIS Design Patterns Storing Columnstore Indexes Hadoop and Its Ecosystem Components in Action Beside the listed sessions, First Timers should also take a look the the page PASS set up for them: http://www.sqlpass.org/summit/2012/Connect/FirstTimers.aspx See you at PASS Summit!

Read the article
Fast Data Executive Round Table FY14 event kit

- by JuergenKress

We are very interested to run joint marketing events jointly with you as our partners! At our SOA Community Workspace (SOA Community membership required) you can find a new Fast Data Executive Round Table FY14 event kit. This event is designed at senior IT and executives level for the purposes of education, awareness, and thought leadership around the subject of big data; and a specific flavor of big data - Fast Data - that has begun to spark the imagination of many Oracle customers. Fast Data is not new. It’s a term that was invented initially by Ovum’s Tony Baer as a way to represent the collection of ‘high velocity’ solutions with respect to the big data. For Oracle, the Fast Data campaign in FY13 began as a way to tie a broader set of solutions together (SOA/Business Process Management, Data Integration and Business Analytics) under a set of use cases focused on real-time, high velocity data. It has helped to give Oracle a leap-frog advantage over many of the niche integration vendors (i.e. Informatica, Pega, Tibco, Software AG, Terracotta) who haven’t been able to address these types of end-to-end use cases which rely on the combination of filtering, in-memory data processing, correlation, real-time data movement and transformation, end-to-end analytics, and business process management. Only Oracle can address all the dimensions of fast data, and only Oracle can provide a set of engineered solutions to address this space. This event is designed to continue that thought leadership momentum and raise the awareness about what Oracle Fast Data solutions are designed to solve. It’s designed to highlight real customer solutions and articulate the business benefits that fast data can address. This is not an event that gets into the esoteric technical standards of Hadoop, NoSQL, and in-memory data grids. This is an event that instead gets into the heart of business problems that big data has left un-addressed and charts the path for next steps in fast data. Get the Fast Data Executive Round Table FY14 event kit here. Support marketing campaigns We can support such events by: Oracle speakers - contact your partner manager Marketing budget - contact your A&C marketing manager Event location - free use of Oracle Customer Visitor Centers conference rooms Promote your event at events.oracle.com: http://tinyurl.com/eventspecialized E-Blast: invite customers to your event – contact your A&C marketing manager For additional marketing kits e.g for Business Process Managementplease visit our SOA Community Workspace. SOA & BPM Partner Community For regular information on Oracle SOA Suite become a member in the SOA & BPM Partner Community for registration please visit www.oracle.com/goto/emea/soa (OPN account required) If you need support with your account please contact the Oracle Partner Business Center. Blog Twitter LinkedIn Facebook Wiki Mix Forum Technorati Tags:

Read the article
Keeping your options open in a cloud solution

- by BuckWoody

In on-premises solutions we have the full range of options open for a given computing solution – but we don’t always take advantage of them, for multiple reasons. Data goes in a Relational Database Management System, files go on a share, and e-mail goes to the Exchange server. Over time, vendors (including ourselves) add in functionality to one product that allow non-standard use of the platform. For example, SQL Server (and Oracle, and others) allow large binary storage in or through the system – something not originally intended for an RDBMS to handle. There are certainly times when this makes sense, of course, but often these platform hammers turn every problem into a nail. It can make us “lazy” in our design – we sometimes don’t take the time to learn another architecture because the one we’ve spent so much time with can handle what we want to do. But there’s a distinct danger here. In nature, when a population shares too many of the same traits, it can cause a complete collapse if a situation exploits a weakness shared by that population. The same is true with not using the righttool for the job in a computing environment. Your company or organization depends on your knowledge as a professional to select the best mix of supportable, flexible, cost-effective technologies to solve their problems, whether you’re in an architect role or not. So take some time today to learn something new. The way I do this is to select a given problem, and try to solve it with a technology I’m not familiar with. For instance – create a Purchase Order system in Excel, then in Hadoop or MongoDB, or even in flat-files using PowerShell as an interface. No, I’m not suggesting any of these architectures are the proper way to solve the PO problem, but taking something concrete that you know well and applying that meta-knowledge to another platform will assist you in exercising the “little grey cells” and help you and your organization understand what is open to you. And of course you can do all of this on-premises – but my recommendation is to check out a cloud platform (my suggestion would of course be Windows Azure :) ) and try it there. Most providers (including Microsoft) provide free time to do that.

Read the article
PXE boot and DHCP server configuration Failing Auto Installation

- by Harihara Vinayakaram

I have a ISC DHCP Server installed on Ubuntu 9.10 . I have managed to successfully boot a PXE client , obtain a DHCP address and load the initrd.gz file. But I am facing a vague problem when the debian installer starts up and tries to get a DHCP server The client send a DHCP request and I verified that is the same MAC Address. But I get a DHCP DECLINE (The client declines the address ). It offers all the address in the pool and then there is a DHCP NAK (no more free leases ) I tried using the Option no-ping, and also option one-client-one-lease but it does not help . If I set the client to use a fixed-address then the above problem is not there and the installation proceeds smoothly Can you give me any clues on what should be the DHCP server configuration My dhcpd.conf looks like this { ddns-update-style none; option domain-name "hadoop-myorg.org"; option domain-name-servers 192.168.3.5; default-lease-time 600; max-lease-time 7200; group { filename "pxelinux.0"; next-server 192.168.13.184; host hadoop1 { hardware ethernet 90:e6:ba:d5:53:f8; } } subnet 192.168.13.0 netmask 255.255.255.0 { option routers 10.0.0.254; pool { option domain-name-servers 192.168.3.5; max-lease-time 3000; range 192.168.13.55 192.168.13.65; deny unknown-clients; } } }

Read the article
Migrating to Amazon AWS etc: What key statistics/questions should be analyzed and asked?

- by cerd

I searched SOverflow pretty extensively for something similar to this set of questions. BACKGROUND: We are a growing 'big(ish)' data chemical data company that are outgrowing our lab and our dedicated production workhorses. Make no mistake, we need to do some serious query optimization. Our data (It comes from a certain govt. agency so the schema and lack of indexing is atrocious). So yes, I know, AWS or EC2 is not a silver bullet in the face of spending time to maybe rework your queries/code entirely 'out of the box'. With that said I would appreciate any input on the following questions: We produce on CentOS and lab on Ubuntu LTS which I prefer especially with their growing cloud / AWS integration. If we are mysql centric, and our biggest problem is these big cartesian products that produce slow queries, should we roll out what we know after more optimization with respect to Ubuntu/mySQL with the added Amazon horsepower? Or is there some merit to the NoSQL and other technologies they offer? What are the key metrics I need to gather from apache and mysql other than like: Disk I/O operations, Data up/down avgs and trends and special high usage periods/scenarios? I've reviewed AWS/EC2 fine print, but want 2nd opinions. What other services aside from the basic web/database have proven valuable to you? I know nothing of Hadoop or many other technologies they offer, echoing my prev. question, do you sometimes find it worth it (Initially having it be a gamble aside from basic homework) to dive/break into a whole new environment and try to/or end up finding a way of more efficiently producing your data/site product? Anything I should watch out for in projecting costs, or any other general advice when working with AWS folks from anyone else where your company is very niche and very very technical (Scientifically - or anybody for that matter)? Thanks very much for your input - I think this thread could be valuable to others as well.

Read the article
GlusterFS on VMWare ESXi 5

- by Dharmavir

I want to build network file system on top of my VMWare ESXi based virtual nodes which are running Ubuntu 12.04 LTS. I am evalaluating options and found that GlusterFS (http://www.gluster.org/) can turn out to be a good choice. Purpose: I have about 2 dozen VM nodes with different configurations, on 2 physical nodes which has following configuration: 16 core Intel Xeon 1 TB 48 GB RAM Now as I said earlier each Physical server has about 1TB hdd and I can increase if I want additional so for now I have 2TB disk space available, these space is distributed in VM nodes I have created on which about 2 dozen VM nodes live. Now some of them being application server and mgmt server, they have plenty of free disk space which I want to utilize for some heavy storage which I can not design if I do that individually on single VM node. This way if my storage is distributed between dozens of VM nodes and about 2 or more physical nodes I have some sort of backup as well. I do not mind if data gets stored redundently but per my knowledge it might hapeen that individual VM nodes will not be able to store all of the data because complete data size for example if we take 100GB will exceed VM disk size of 70GB and then VM will also have system and program files on it. I need some suggestion that will GlusterFS be the solution for which I am looking forward to or I should go with something like hadoop? I am not too sure. But yes, I would like to utilize my free space on each VM node and while doing that if I get store data redundently I am okay because it will give me data security.

Read the article
Unable to force Debian to do unattended install... libc6 wants interactive confirm

- by JD Long

I'm trying to create a script that forces a Debian Lenny install to install the latest version of CRAN R. During the install it appears libc6 is upgraded and the install wants interactive confirm that it's OK to restart three services (mysql, exim4, cron). This process HAS to be unattended as it runs on Amazon's Elastic Map Reduce (EMR) machines. But I'm running out of options. Here's a few things I've tried: This previous question appears to be exactly what I'm looking for. So I set up my install script as follows: # set my CRAN repos... yes, I know there's a new convention where to put these. echo "deb http://cran.r-project.org/bin/linux/debian lenny-cran/" | sudo tee -a /etc/apt/sources.list echo "deb-src http://cran.r-project.org/bin/linux/debian lenny-cran/" | sudo tee -a /etc/apt/sources.list # set the dpkg.cfg options per the previous SuperUser question echo "force-confold" | sudo tee -a /etc/dpkg/dpkg.cfg echo "force-confdef" | sudo tee -a /etc/dpkg/dpkg.cfg export DEBIAN_FRONTEND=noninteractive # add key to keyring so it doesn't complain gpg --keyserver pgp.mit.edu --recv-key 381BA480 gpg -a --export 381BA480 > jranke_cran.asc sudo apt-key add jranke_cran.asc sudo apt-get update # install the latest R sudo apt-get install --yes --force-yes r-base But this script hangs with the following request for input: OK, so I tried stopping the services using the following script: sudo /etc/init.d/mysql stop sudo /etc/init.d/exim4 stop sudo /etc/init.d/cron stop sudo apt-get install --yes --force-yes libc6 This does not work and the interactive screen comes back, but this time with only cron listed as the service that must be restarted. So is there a way to make libc6 just restart these services with no user input? Or is there a way to stop cron so it does not cause an interactive prompt? Maybe a creative option I've never thought of? Keep in mind that this system is brought up, some Hadoop code is run, and then it's torn down. So I can put up with side effects and bad behavior that we might not want in a production desktop machine or web server.

Read the article
Distributed storage and computing

- by Tim van Elteren

Dear Serverfault community, After researching a number of distributed file systems for deployment in a production environment with the main purpose of performing both batch and real-time distributed computing I've identified the following list as potential candidates, mainly on maturity, license and support: Ceph Lustre GlusterFS HDFS FhGFS MooseFS XtreemFS The key properties that our system should exhibit: an open source, liberally licensed, yet production ready, e.g. a mature, reliable, community and commercially supported solution; ability to run on commodity hardware, preferably be designed for it; provide high availability of the data with the most focus on reads; high scalability, so operation over multiple data centres, possibly on a global scale; removal of single points of failure with the use of replication and distribution of (meta-)data, e.g. provide fault-tolerance. The sensitivity points that were identified, and resulted in the following questions, are: transparency to the processing layer / application with respect to data locality, e.g. know where data is physically located on a server level, mainly for resource allocation and fast processing, high performance, how can this be accomplished? Do you from experience know what solutions provide this transparency and to what extent? posix compliance, or conformance, is mentioned on the wiki pages of most of the above listed solutions. The question here mainly is, how relevant is support for the posix standard? Hadoop for example isn't posix compliant by design, what are the pro's and con's? what about the difference between synchronous and asynchronous opeartion of a distributed file system. Though a synchronous distributed file system has the preference because of reliability it also imposes certain limitations with respect to scalability. What would be, from your expertise, the way to go on this? I'm looking forward to your replies. Thanks in advance! :) With kind regards, Tim van Elteren

Read the article
Big Data: Size isn’t everything

- by Simon Elliston Ball

Big Data has a big problem; it’s the word “Big”. These days, a quick Google search will uncover terabytes of negative opinion about the futility of relying on huge volumes of data to produce magical, meaningful insight. There are also many clichéd but correct assertions about the difficulties of correlation versus causation, in massive data sets. In reading some of these pieces, I begin to understand how climatologists must feel when people complain ironically about “global warming” during snowfall. Big Data has a name problem. There is a lot more to it than size. Shape, Speed, and…err…Veracity are also key elements (now I understand why Gartner and the gang went with V’s instead of S’s). The need to handle data of different shapes (Variety) is not new. Data developers have always had to mold strange-shaped data into our reporting systems, integrating with semi-structured sources, and even straying into full-text searching. However, what we lacked was an easy way to add semi-structured and unstructured data to our arsenal. New “Big Data” tools such as MongoDB, and other NoSQL (Not Only SQL) databases, or a graph database like Neo4J, fill this gap. Still, to many, they simply introduce noise to the clean signal that is their sensibly normalized data structures. What about speed (Velocity)? It’s not just high frequency trading that generates data faster than a single system can handle. Many other applications need to make trade-offs that traditional databases won’t, in order to cope with high data insert speeds, or to extract quickly the required information from data streams. Unfortunately, many people equate Big Data with the Hadoop platform, whose batch driven queries and job processing queues have little to do with “velocity”. StreamInsight, Esper and Tibco BusinessEvents are examples of Big Data tools designed to handle high-velocity data streams. Again, the name doesn’t do the discipline of Big Data any favors. Ultimately, though, does analyzing fast moving data produce insights as useful as the ones we get through a more considered approach, enabled by traditional BI? Finally, we have Veracity and Value. In many ways, these additions to the classic Volume, Velocity and Variety trio acknowledge the criticism that without high-quality data and genuinely valuable outputs then data, big or otherwise, is worthless. As a discipline, Big Data has recognized this, and data quality and cleaning tools are starting to appear to support it. Rather than simply decrying the irrelevance of Volume, we need as a profession to focus how to improve Veracity and Value. Perhaps we should just declare the ‘Big’ silent, embrace these new data tools and help develop better practices for their use, just as we did the good old RDBMS? What does Big Data mean to you? Which V gives your business the most pain, or the most value? Do you see these new tools as a useful addition to the BI toolbox, or are they just enabling a dangerous trend to find ghosts in the noise?

Read the article
Oracle Data Integration 12c: Simplified, Future-Ready, High-Performance Solutions

- by Thanos Terentes Printzios

In today’s data-driven business environment, organizations need to cost-effectively manage the ever-growing streams of information originating both inside and outside the firewall and address emerging deployment styles like cloud, big data analytics, and real-time replication. Oracle Data Integration delivers pervasive and continuous access to timely and trusted data across heterogeneous systems. Oracle is enhancing its data integration offering announcing the general availability of 12c release for the key data integration products: Oracle Data Integrator 12c and Oracle GoldenGate 12c, delivering Simplified and High-Performance Solutions for Cloud, Big Data Analytics, and Real-Time Replication. The new release delivers extreme performance, increase IT productivity, and simplify deployment, while helping IT organizations to keep pace with new data-oriented technology trends including cloud computing, big data analytics, real-time business intelligence. With the 12c release Oracle becomes the new leader in the data integration and replication technologies as no other vendor offers such a complete set of data integration capabilities for pervasive, continuous access to trusted data across Oracle platforms as well as third-party systems and applications. Oracle Data Integration 12c release addresses data-driven organizations’ critical and evolving data integration requirements under 3 key themes: Future-Ready Solutions : Supporting Current and Emerging Initiatives Extreme Performance : Even higher performance than ever before Fast Time-to-Value : Higher IT Productivity and Simplified Solutions With the new capabilities in Oracle Data Integrator 12c, customers can benefit from: Superior developer productivity, ease of use, and rapid time-to-market with the new flow-based mapping model, reusable mappings, and step-by-step debugger. Increased performance when executing data integration processes due to improved parallelism. Improved productivity and monitoring via tighter integration with Oracle GoldenGate 12c and Oracle Enterprise Manager 12c. Improved interoperability with Oracle Warehouse Builder which enables faster and easier migration to Oracle Data Integrator’s strategic data integration offering. Faster implementation of business analytics through Oracle Data Integrator pre-integrated with Oracle BI Applications’ latest release. Oracle Data Integrator also integrates simply and easily with Oracle Business Analytics tools, including OBI-EE and Oracle Hyperion. Support for loading and transforming big and fast data, enabled by integration with big data technologies: Hadoop, Hive, HDFS, and Oracle Big Data Appliance. Only Oracle GoldenGate provides the best-of-breed real-time replication of data in heterogeneous data environments. With the new capabilities in Oracle GoldenGate 12c, customers can benefit from: Simplified setup and management of Oracle GoldenGate 12c when using multiple database delivery processes via a new Coordinated Delivery feature for non-Oracle databases. Expanded heterogeneity through added support for the latest versions of major databases such as Sybase ASE v 15.7, MySQL NDB Clusters 7.2, and MySQL 5.6., as well as integration with Oracle Coherence. Enhanced high availability and data protection via integration with Oracle Data Guard and Fast-Start Failover integration. Enhanced security for credentials and encryption keys using Oracle Wallet. Real-time replication for databases hosted on public cloud environments supported by third-party clouds. Tight integration between Oracle Data Integrator 12c and Oracle GoldenGate 12c and other Oracle technologies, such as Oracle Database 12c and Oracle Applications, provides a number of benefits for organizations: Tight integration between Oracle Data Integrator 12c and Oracle GoldenGate 12c enables developers to leverage Oracle GoldenGate’s low overhead, real-time change data capture completely within the Oracle Data Integrator Studio without additional training. Integration with Oracle Database 12c provides a strong foundation for seamless private cloud deployments. Delivers real-time data for reporting, zero downtime migration, and improved performance and availability for Oracle Applications, such as Oracle E-Business Suite and ATG Web Commerce . Oracle’s data integration offering is optimized for Oracle Engineered Systems and is an integral part of Oracle’s fast data, real-time analytics strategy on Oracle Exadata Database Machine and Oracle Exalytics In-Memory Machine. Oracle Data Integrator 12c and Oracle GoldenGate 12c differentiate the new offering on data integration with these many new features. This is just a quick glimpse into Oracle Data Integrator 12c and Oracle GoldenGate 12c. Find out much more about the new release in the video webcast "Introducing 12c for Oracle Data Integration", where customer and partner speakers, including SolarWorld, BT, Rittman Mead will join us in launching the new release. Resource Kits Meet Oracle Data Integration 12c Discover what's new with Oracle Goldengate 12c Oracle EMEA DIS (Data Integration Solutions) Partner Community is available for all your questions, while additional partner focused webcasts will be made available through our blog here, so stay connected. For any questions please contact us at partner.imc-AT-beehiveonline.oracle-DOT-com Stay Connected Oracle Newsletters

Read the article
Big Data – Beginning Big Data – Day 1 of 21

- by Pinal Dave

What is Big Data? I want to learn Big Data. I have no clue where and how to start learning about it. Does Big Data really means data is big? What are the tools and software I need to know to learn Big Data? I often receive questions which I mentioned above. They are good questions and honestly when we search online, it is hard to find authoritative and authentic answers. I have been working with Big Data and NoSQL for a while and I have decided that I will attempt to discuss this subject over here in the blog. In the next 21 days we will understand what is so big about Big Data. Big Data – Big Thing! Big Data is becoming one of the most talked about technology trends nowadays. The real challenge with the big organization is to get maximum out of the data already available and predict what kind of data to collect in the future. How to take the existing data and make it meaningful that it provides us accurate insight in the past data is one of the key discussion points in many of the executive meetings in organizations. With the explosion of the data the challenge has gone to the next level and now a Big Data is becoming the reality in many organizations. Big Data – A Rubik’s Cube I like to compare big data with the Rubik’s cube. I believe they have many similarities. Just like a Rubik’s cube it has many different solutions. Let us visualize a Rubik’s cube solving challenge where there are many experts participating. If you take five Rubik’s cube and mix up the same way and give it to five different expert to solve it. It is quite possible that all the five people will solve the Rubik’s cube in fractions of the seconds but if you pay attention to the same closely, you will notice that even though the final outcome is the same, the route taken to solve the Rubik’s cube is not the same. Every expert will start at a different place and will try to resolve it with different methods. Some will solve one color first and others will solve another color first. Even though they follow the same kind of algorithm to solve the puzzle they will start and end at a different place and their moves will be different at many occasions. It is nearly impossible to have a exact same route taken by two experts. Big Market and Multiple Solutions Big Data is exactly like a Rubik’s cube – even though the goal of every organization and expert is same to get maximum out of the data, the route and the starting point are different for each organization and expert. As organizations are evaluating and architecting big data solutions they are also learning the ways and opportunities which are related to Big Data. There is not a single solution to big data as well there is not a single vendor which can claim to know all about Big Data. Honestly, Big Data is too big a concept and there are many players – different architectures, different vendors and different technology. What is Next? In this 31 days series we will be exploring many essential topics related to big data. I do not claim that you will be master of the subject after 31 days but I claim that I will be covering following topics in easy to understand language. Architecture of Big Data Big Data a Management and Implementation Different Technologies – Hadoop, Mapreduce Real World Conversations Best Practices Tomorrow In tomorrow’s blog post we will try to answer one of the very essential questions – What is Big Data? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Fast Data: Go Big. Go Fast.

- by Dain C. Hansen

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 For those of you who may have missed it, today’s second full day of Oracle OpenWorld 2012 started with a rumpus. Joe Tucci, from EMC outlined the human face of big data with real examples of how big data is transforming our world. And no not the usual tried-and-true weblog examples, but real stories about taxi cab drivers in Singapore using big data to better optimize their routes as well as folks just trying to get a better hair cut. Next we heard from Thomas Kurian who talked at length about the important platform characteristics of Oracle’s Cloud and more specifically Oracle’s expanded Cloud Services portfolio. Especially interesting to our integration customers are the messaging support for Oracle’s Cloud applications. What this means is that now Oracle’s Cloud applications have a lightweight integration fabric that on-premise applications can communicate to it via REST-APIs using Oracle SOA Suite. It’s an important element to our strategy at Oracle that supports this idea that whether your requirements are for private or public, Oracle has a solution in the Cloud for all of your applications and we give you more deployment choice than any vendor. If this wasn’t enough to get the juices flowing, later that morning we heard from Hasan Rizvi who outlined in his Fusion Middleware session the four most important enterprise imperatives: Social, Mobile, Cloud, and a brand new one: Fast Data. Today, Rizvi made an important step in the definition of this term to explain that he believes it’s a convergence of four essential technology elements: Event Processing for event filtering, business rules – with Oracle Event Processing Data Transformation and Loading - with Oracle Data Integrator Real-time replication and integration – with Oracle GoldenGate Analytics and data discovery – with Oracle Business Intelligence Each of these four elements can be considered (and architect-ed) together on a single integrated platform that can help customers integrate any type of data (structured, semi-structured) leveraging new styles of big data technologies (MapReduce, HDFS, Hive, NoSQL) to process more volume and variety of data at a faster velocity with greater results. Fast data processing (and especially real-time) has always been our credo at Oracle with each one of these products in Fusion Middleware. For example, Oracle GoldenGate continues to be made even faster with the recent 11g R2 Release of Oracle GoldenGate which gives us some even greater optimization to Oracle Database with Integrated Capture, as well as some new heterogeneity capabilities. With Oracle Data Integrator with Big Data Connectors, we’re seeing much improved performance by running MapReduce transformations natively on Hadoop systems. And with Oracle Event Processing we’re seeing some remarkable performance with customers like NTT Docomo. Check out their upcoming session at Oracle OpenWorld on Wednesday to hear more how this customer is using Event processing and Big Data together. If you missed any of these sessions and keynotes, not to worry. There's on-demand versions available on the Oracle OpenWorld website. You can also checkout our upcoming webcast where we will outline some of these new breakthroughs in Data Integration technologies for Big Data, Cloud, and Real-time in more details. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";}

Read the article
Why your Netapp is so slow...

- by Darius Zanganeh

Have you ever wondered why your Netapp FAS box is slow and doesn't perform well at large block workloads? In this blog entry I will give you a little bit of information that will probably help you understand why it’s so slow, why you shouldn't use it for applications that read and write in large blocks like 64k, 128k, 256k ++ etc.. Of course since I work for Oracle at this time, I will show you why the ZS3 storage boxes are excellent choices for these types of workloads. Netapp’s Fundamental Problem The fundamental problem you have running these workloads on Netapp is the backend block size of their WAFL file system. Every application block on a Netapp FAS ends up in a 4k chunk on a disk. Reference: Netapp TR-3001 Whitepaper Netapp has proven this lacking large block performance fact in at least two different ways. They have NEVER posted an SPC-2 Benchmark yet they have posted SPC-1 and SPECSFS, both recently. In 2011 they purchased Engenio to try and fill this GAP in their portfolio. Block Size Matters So why does block size matter anyways? Many applications use large block chunks of data especially in the Big Data movement. Some examples are SAS Business Analytics, Microsoft SQL, Hadoop HDFS is even 64MB! Now let me boil this down for you. If an application such MS SQL is writing data in a 64k chunk then before Netapp actually writes it on disk it will have to split it into 16 different 4k writes and 16 different disk IOPS. When the application later goes to read that 64k chunk the Netapp will have to again do 16 different disk IOPS. In comparison the ZS3 Storage Appliance can write in variable block sizes ranging from 512b to 1MB. So if you put the same MSSQL database on a ZS3 you can set the specific LUNs for this database to 64k and then when you do an application read/write it requires only a single disk IO. That is 16x faster! But, back to the problem with your Netapp, you will VERY quickly run out of disk IO and hit a wall. Now all arrays will have some fancy pre fetch algorithm and some nice cache and maybe even flash based cache such as a PAM card in your Netapp but with large block workloads you will usually blow through the cache and still need significant disk IO. Also because these datasets are usually very large and usually not dedupable they are usually not good candidates for an all flash system. You can do some simple math in excel and very quickly you will see why it matters. Here are a couple of READ examples using SAS and MSSQL. Assume these are the READ IOPS the application needs even after all the fancy cache and algorithms. Here is an example with 128k blocks. Notice the numbers of drives on the Netapp! Here is an example with 64k blocks You can easily see that the Oracle ZS3 can do dramatically more work with dramatically less drives. This doesn't even take into account that the ONTAP system will likely run out of CPU way before you get to these drive numbers so you be buying many more controllers. So with all that said, lets look at the ZS3 and why you should consider it for any workload your running on Netapp today. ZS3 World Record Price/Performance in the SPC-2 benchmark ZS3-2 is #1 in Price Performance $12.08ZS3-2 is #3 in Overall Performance 16,212 MBPS Note: The number one overall spot in the world is held by an AFA 33,477 MBPS but at a Price Performance of $29.79. A customer could purchase 2 x ZS3-2 systems in the benchmark with relatively the same performance and walk away with $600,000 in their pocket.

Read the article
Building a Data Mart with Pentaho Data Integration Video Review by Diethard Steiner, Packt Publishing

- by Compudicted

Originally posted on: http://geekswithblogs.net/Compudicted/archive/2014/06/01/building-a-data-mart-with-pentaho-data-integration-video-review.aspx The Building a Data Mart with Pentaho Data Integration Video by Diethard Steiner from Packt Publishing is more than just a course on how to use Pentaho Data Integration, it also implements and uses the principals of the Data Warehousing (and I even heard the name of Ralph Kimball in the video). Indeed, a video watcher should be familiar with its concepts as the Star Schema, Slowly Changing Dimension types, etc. so I suggest prior to watching this course to consider skimming through the Data Warehouse concepts (if unfamiliar) or even better, read the excellent Ralph’s The Data Warehouse Tooolkit. By the way, the author expands beyond using Pentaho along to MySQL and MonetDB which is a real icing on the cake! Indeed, I even suggest the name of the course should be ‘Building a Data Warehouse with Pentaho’. To successfully complete the course one needs to know some Linux (Ubuntu used in the course), the VI editor and the Bash command shell, but it seems that similar requirements would also apply to the Weindows OS. Additionally, knowing some basic SQL would not hurt. As I had said, MonetDB is used in this course several times which seems to be not anymore complex than say MySQL, but based on what I read is very well suited for fast querying big volumes of data thanks to having a columnstore (vertical data storage). I don’t see what else can be a barrier, the material is very digestible. On this note, I must add that the author does not cover how to acquire the software, so here is what I found may help: Pentaho: the free Community Edition must be more than anyone needs to learn it. Or even go into a POC. MonetDB can be downloaded (exists for both, Linux and Windows) from http://goo.gl/FYxMy0 (just see the appropriate link on the left). The author seems to be using Eclipse to run SQL code, one can get it from http://goo.gl/5CcuN. To create, or edit database entities and/or schema otherwise one can use a universal tool called SQuirreL, get it from http://squirrel-sql.sourceforge.net. Next, I must confess Diethard is very knowledgeable in what he does and beyond. However, there will be some accent heard to the user of the course especially if one’s mother tongue language is English, but it I got over it in a few chapters. I liked the rate at which the material is being presented, it makes me feel I paid for every second Eventually, my impressions are: Pentaho is an awesome ETL offering, it is worth learning it very much (I am an ETL fan and a heavy user of SSIS) MonetDB is nice, it tickles my fancy to know it more Data Warehousing, despite all the BigData tool offerings (Hive, Scoop, Pig on Hadoop), using the traditional tools still rocks Chapters 2 to 6 were the most fun to me with chapter 8 being the most difficult. In terms of closing, I highly recommend this video to anyone who needs to grasp Pentaho concepts quick, likewise, the course is very well suited for any developer on a “supposed to be done yesterday” type of a project. It is for a beginner to intermediate level ETL/DW developer. But one would need to learn more on Data Warehousing and Pentaho, for such I recommend the 5 star Pentaho Data Integration 4 Cookbook. Enjoy it! Disclaimer: I received this video from the publisher for the purpose of a public review.

Read the article
Building a Data Mart with Pentaho Data Integration Video Review by Diethard Steiner, Packt Publishing

- by Compudicted

Originally posted on: http://geekswithblogs.net/Compudicted/archive/2014/06/01/building-a-data-mart-with-pentaho-data-integration-video-review-again.aspx The Building a Data Mart with Pentaho Data Integration Video by Diethard Steiner from Packt Publishing is more than just a course on how to use Pentaho Data Integration, it also implements and uses the principals of the Data Warehousing (and I even heard the name of Ralph Kimball in the video). Indeed, a video watcher should be familiar with its concepts as the Star Schema, Slowly Changing Dimension types, etc. so I suggest prior to watching this course to consider skimming through the Data Warehouse concepts (if unfamiliar) or even better, read the excellent Ralph’s The Data Warehouse Tooolkit. By the way, the author expands beyond using Pentaho along to MySQL and MonetDB which is a real icing on the cake! Indeed, I even suggest the name of the course should be ‘Building a Data Warehouse with Pentaho’. To successfully complete the course one needs to know some Linux (Ubuntu used in the course), the VI editor and the Bash command shell, but it seems that similar requirements would also apply to the Windows OS. Additionally, knowing some basic SQL would not hurt. As I had said, MonetDB is used in this course several times which seems to be not anymore complex than say MySQL, but based on what I read is very well suited for fast querying big volumes of data thanks to having a columnstore (vertical data storage). I don’t see what else can be a barrier, the material is very digestible. On this note, I must add that the author does not cover how to acquire the software, so here is what I found may help: Pentaho: the free Community Edition must be more than anyone needs to learn it. Or even go into a POC. MonetDB can be downloaded (exists for both, Linux and Windows) from http://goo.gl/FYxMy0 (just see the appropriate link on the left). The author seems to be using Eclipse to run SQL code, one can get it from http://goo.gl/5CcuN. To create, or edit database entities and/or schema otherwise one can use a universal tool called SQuirreL, get it from http://squirrel-sql.sourceforge.net. Next, I must confess Diethard is very knowledgeable in what he does and beyond. However, there will be some accent heard to the user of the course especially if one’s mother tongue language is English, but it I got over it in a few chapters. I liked the rate at which the material is being presented, it makes me feel I paid for every second Eventually, my impressions are: Pentaho is an awesome ETL offering, it is worth learning it very much (I am an ETL fan and a heavy user of SSIS) MonetDB is nice, it tickles my fancy to know it more Data Warehousing, despite all the BigData tool offerings (Hive, Scoop, Pig on Hadoop), using the traditional tools still rocks Chapters 2 to 6 were the most fun to me with chapter 8 being the most difficult. In terms of closing, I highly recommend this video to anyone who needs to grasp Pentaho concepts quick, likewise, the course is very well suited for any developer on a “supposed to be done yesterday” type of a project. It is for a beginner to intermediate level ETL/DW developer. But one would need to learn more on Data Warehousing and Pentaho, for such I recommend the 5 star Pentaho Data Integration 4 Cookbook. Enjoy it! Disclaimer: I received this video from the publisher for the purpose of a public review.

Read the article
Microsoft Cloud Day - the ups and downs

- by Charles Young

The term ‘cloud’ can sometimes obscure the obvious. Today’s Microsoft Cloud Day conference in London provided a good example. Scott Guthrie was halfway through what was an excellent keynote when he lost network connectivity. This proved very disruptive to his presentation which centred on a series of demonstrations of the Azure platform in action. Great efforts were made to find a solution, but no quick fix presented itself. The venue’s IT facilities were dreadful – no WiFi, poor 3G reception (forget 4G…this is the UK) and, unbelievably, no-one on hand from the venue staff to help with infrastructure issues. Eventually, after an unscheduled break, a solution was found and Scott managed to complete his demonstrations. Further connectivity issues occurred during the day. I can say that the cause was prosaic. A member of the venue staff had interfered with a patch board and inadvertently disconnected Scott Guthrie’s machine from the network by pulling out a cable. I need to state the obvious here. If your PC is disconnected from the network it can’t communicate with other systems. This could include a machine under someone’s desk, a mail server located down the hall, a server in the local data centre, an Internet search engine or even, heaven forbid, a role running on Azure. Inadvertently disconnecting a PC from the network does not imply a fundamental problem with the cloud or any specific cloud platform. Some of the tweeted comments I’ve seen today are analogous to suggesting that, if you accidently unplug your microwave from the mains, this suggests some fundamental flaw with the electricity supply to your house. This is poor reasoning, to say the least. As far as the conference was concerned, the connectivity issue in the keynote, coupled with some later problems in a couple of presentations, served to exaggerate the perception of poor organisation. Software problems encountered before the conference prevented the correct set-up of a smartphone app intended to convey agenda information to attendees. Although some information was available via this app, the organisers decided to print out an agenda at the last moment. Unfortunately, the agenda sheet did not convey enough information, and attendees were forced to approach conference staff through the day to clarify locations of the various presentations. Despite these problems, the overwhelming feedback from conference attendees was very positive. There was a real sense of excitement in the morning keynote. For many, this was their first sight of new Azure features delivered in the ‘spring’ release. The most common reaction I heard was amazement and appreciation that Azure’s new IaaS features deliver built-in template support for several flavours of Linux from day one. This coupled with open source SDKs and several presentations on Azure’s support for Java, node.js, PHP, MongoDB and Hadoop served to communicate that the Azure platform is maturing quickly. The new virtual network capabilities also surprised many attendees, and the much improved portal experience went down very well. So, despite some very irritating and disruptive problems, the event served its purpose well, communicating the breadth and depth of the newly upgraded Azure platform. I enjoyed the day very much.

Read the article
Big Data – Beginning Big Data Series Next Month in 21 Parts

- by Pinal Dave

Big Data is the next big thing. There was a time when we used to talk in terms of MB and GB of the data. However, the industry is changing and we are now moving to a conversation where we discuss about data in Petabyte, Exabyte and Zettabyte. It seems that the world is now talking about increased Volume of the data. In simple world we all think that Big Data is nothing but plenty of volume. In reality Big Data is much more than just a huge volume of the data. When talking about the data we need to understand about variety and volume along with volume. Though Big data look like a simple concept, it is extremely complex subject when we attempt to start learning the same. My Journey I have recently presented on Big Data in quite a few organizations and I have received quite a few questions during this roadshow event. I have collected all the questions which I have received and decided to post about them on the blog. In the month of October 2013, on every weekday we will be learning something new about Big Data. Every day I will share a concept/question and in the same blog post we will learn the answer of the same. Big Data – Plenty of Questions I received quite a few questions during my road trip. Here are few of the questions. I want to learn Big Data – where should I start? Do I need to know SQL to learn Big Data? What is Hadoop? There are so many organizations talking about Big Data, and every one has a different approach. How to start with big Data? Do I need to know Java to learn about Big Data? What is different between various NoSQL languages. I will attempt to answer most of the questions during the month long series in the next month. Big Data – Big Subject Big Data is a very big subject and I no way claim that I will be covering every single big data concept in this series. However, I promise that I will be indeed sharing lots of basic concepts which are revolving around Big Data. We will discuss from fundamentals about Big Data and continue further learning about it. I will attempt to cover the concept so simple that many of you might have wondered about it but afraid to ask. Your Role! During this series next month, I need your one help. Please keep on posting questions you might have related to big data as blog post comments and on Facebook Page. I will monitor them closely and will try to answer them as well during this series. Now make sure that you do not miss any single blog post in this series as every blog post will be linked to each other. You can subscribe to my feed or like my Facebook page or subscribe via email (by entering email in the blog post). Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Big Data, PostADay, SQL, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Basics of Big Data Architecture – Day 4 of 21

- by Pinal Dave

In yesterday’s blog post we understood how Big Data evolution happened. Today we will understand basics of the Big Data Architecture. Big Data Cycle Just like every other database related applications, bit data project have its development cycle. Though three Vs (link) for sure plays an important role in deciding the architecture of the Big Data projects. Just like every other project Big Data project also goes to similar phases of the data capturing, transforming, integrating, analyzing and building actionable reporting on the top of the data. While the process looks almost same but due to the nature of the data the architecture is often totally different. Here are few of the question which everyone should ask before going ahead with Big Data architecture. Questions to Ask How big is your total database? What is your requirement of the reporting in terms of time – real time, semi real time or at frequent interval? How important is the data availability and what is the plan for disaster recovery? What are the plans for network and physical security of the data? What platform will be the driving force behind data and what are different service level agreements for the infrastructure? This are just basic questions but based on your application and business need you should come up with the custom list of the question to ask. As I mentioned earlier this question may look quite simple but the answer will not be simple. When we are talking about Big Data implementation there are many other important aspects which we have to consider when we decide to go for the architecture. Building Blocks of Big Data Architecture It is absolutely impossible to discuss and nail down the most optimal architecture for any Big Data Solution in a single blog post, however, we can discuss the basic building blocks of big data architecture. Here is the image which I have built to explain how the building blocks of the Big Data architecture works. Above image gives good overview of how in Big Data Architecture various components are associated with each other. In Big Data various different data sources are part of the architecture hence extract, transform and integration are one of the most essential layers of the architecture. Most of the data is stored in relational as well as non relational data marts and data warehousing solutions. As per the business need various data are processed as well converted to proper reports and visualizations for end users. Just like software the hardware is almost the most important part of the Big Data Architecture. In the big data architecture hardware infrastructure is extremely important and failure over instances as well as redundant physical infrastructure is usually implemented. NoSQL in Data Management NoSQL is a very famous buzz word and it really means Not Relational SQL or Not Only SQL. This is because in Big Data Architecture the data is in any format. It can be unstructured, relational or in any other format or from any other data source. To bring all the data together relational technology is not enough, hence new tools, architecture and other algorithms are invented which takes care of all the kind of data. This is collectively called NoSQL. Tomorrow Next four days we will answer the Buzz Words – Hadoop. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Cutting Subscriber Churn with Media Intelligence

- by Oracle M&E

There's lots of talk in media and entertainment companies about using "big data". But it's often hard to see through the hype and understand how big data brings benefits in the real world. How about being able to predict with 92% accuracy which subscribers intend to cancel their subscription - and put in place a renewal strategy to dramatically reduce that churn? That's what Belgian media company De Persgroep has achieved with Oracle's Media Intelligence solution. "One of the areas in which we're able to achieve beautiful results using big data is the churn prediction," De Persgroep's CIO Luc Verbist explains in a new Oracle video. "Based on all the data that we collect on websites and all your behavior, payment behavior and so on, we're able to make a prediction model, which, with an accuracy of 92 percent, is able to predict that you probably won't renew your newspaper, anymore. So our approach to renewal is completely different to the people in that segment than towards the other people. And this has brought us a lot of value and a lot of customers who didn't stop their newspaper where else they would have done so." De Persgroep is using Oracle's Big Data Appliance, along with software from Oracle partner NGDATA to build up a detailed "DNA profile" of each individual customer, based on every interaction, in real time. This means that any change in behavior - a drop in content consumption, a late subscription payment, a negative social media comment - is captured. Applying advanced data modeling techniques automatically converts those raw interactions into data with real business meaning - like that customer's risk of churning. The very same data profile - comprising hundreds if individual dimensions - can simultaneously drive targeted marketing campaigns - informing audience about new content that's most relevant and encouraging them to subscribe. It can power content recommendations and personalization right in the content sites and apps. And it can link directly into digital advertising networks via platforms like Oracle's BlueKai data management platform (DMP), to drive increased advertising CPMs. Using Oracle's Media Intelligence solution enables this across De Persgroep's business - comprising eight newspapers and 25 magazines published in Belgium and The Netherlands, and digital properties including websites with 6m daily unique visitors, along with TV and radio stations. "The company strategy is in fact a customer-centric strategy, so we want to get a 360-view about our customers, about our prospects. And the big data project helped us to achieve that goal," says Verbist. Using Oracle's Big Data Appliance to underpin the solution created huge savings. "The selection of the Big Data Appliance was quite easy. It was very quick to install, very easy to install, as well. And it was far cheaper than building our own Hadoop cluster. So it was in fact a non-brainer," Verbist explains. Applying Media Intelligence approach has yielded incredible results for De Persgroep, including: Improved products - with a new understanding of how readers are consuming print and digital content across the day Improved customer segmentation - driving a 6X improvement in customer prospecting and acquisition when contacting a specific segment Having the project up and running in three months And that has led to competitive benefits for De Persgroep, as Luc Verbist explains: "one of the results we saw since we started using big data is that we're able to increase the gap between we as the market leader, and the second [by] more than 20 percent."

Read the article
Big Data – How to become a Data Scientist and Learn Data Science? – Day 19 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the analytics in Big Data Story. In this article we will understand how to become a Data Scientist for Big Data Story. Data Scientist is a new buzz word, everyone seems to be wanting to become Data Scientist. Let us go over a few key topics related to Data Scientist in this blog post. First of all we will understand what is a Data Scientist. In the new world of Big Data, I see pretty much everyone wants to become Data Scientist and there are lots of people I have already met who claims that they are Data Scientist. When I ask what is their role, I have got a wide variety of answers. What is Data Scientist? Data scientists are the experts who understand various aspects of the business and know how to strategies data to achieve the business goals. They should have a solid foundation of various data algorithms, modeling and statistics methodology. What do Data Scientists do? Data scientists understand the data very well. They just go beyond the regular data algorithms and builds interesting trends from available data. They innovate and resurrect the entire new meaning from the existing data. They are artists in disguise of computer analyst. They look at the data traditionally as well as explore various new ways to look at the data. Data Scientists do not wait to build their solutions from existing data. They think creatively, they think before the data has entered into the system. Data Scientists are visionary experts who understands the business needs and plan ahead of the time, this tremendously help to build solutions at rapid speed. Besides being data expert, the major quality of Data Scientists is “curiosity”. They always wonder about what more they can get from their existing data and how to get maximum out of future incoming data. Data Scientists do wonders with the data, which goes beyond the job descriptions of Data Analysist or Business Analysist. Skills Required for Data Scientists Here are few of the skills a Data Scientist must have. Expert level skills with statistical tools like SAS, Excel, R etc. Understanding Mathematical Models Hands-on with Visualization Tools like Tableau, PowerPivots, D3. j’s etc. Analytical skills to understand business needs Communication skills On the technology front any Data Scientists should know underlying technologies like (Hadoop, Cloudera) as well as their entire ecosystem (programming language, analysis and visualization tools etc.) . Remember that for becoming a successful Data Scientist one require have par excellent skills, just having a degree in a relevant education field will not suffice. Final Note Data Scientists is indeed very exciting job profile. As per research there are not enough Data Scientists in the world to handle the current data explosion. In near future Data is going to expand exponentially, and the need of the Data Scientists will increase along with it. It is indeed the job one should focus if you like data and science of statistics. Courtesy: emc Tomorrow In tomorrow’s blog post we will discuss about various Big Data Learning resources. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Big Data – Operational Databases Supporting Big Data – Columnar, Graph and Spatial Database – Day 14 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Key-Value Pair Databases and Document Databases in the Big Data Story. In this article we will understand the role of Columnar, Graph and Spatial Database supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (The day before yesterday’s post) NoSQL Databases (The day before yesterday’s post) Key-Value Pair Databases (Yesterday’s post) Document Databases (Yesterday’s post) Columnar Databases (Tomorrow’s post) Graph Databases (Today’s post) Spatial Databases (Today’s post) Columnar Databases Relational Database is a row store database or a row oriented database. Columnar databases are column oriented or column store databases. As we discussed earlier in Big Data we have different kinds of data and we need to store different kinds of data in the database. When we have columnar database it is very easy to do so as we can just add a new column to the columnar database. HBase is one of the most popular columnar databases. It uses Hadoop file system and MapReduce for its core data storage. However, remember this is not a good solution for every application. This is particularly good for the database where there is high volume incremental data is gathered and processed. Graph Databases For a highly interconnected data it is suitable to use Graph Database. This database has node relationship structure. Nodes and relationships contain a Key Value Pair where data is stored. The major advantage of this database is that it supports faster navigation among various relationships. For example, Facebook uses a graph database to list and demonstrate various relationships between users. Neo4J is one of the most popular open source graph database. One of the major dis-advantage of the Graph Database is that it is not possible to self-reference (self joins in the RDBMS terms) and there might be real world scenarios where this might be required and graph database does not support it. Spatial Databases We all use Foursquare, Google+ as well Facebook Check-ins for location aware check-ins. All the location aware applications figure out the position of the phone with the help of Global Positioning System (GPS). Think about it, so many different users at different location in the world and checking-in all together. Additionally, the applications now feature reach and users are demanding more and more information from them, for example like movies, coffee shop or places see. They are all running with the help of Spatial Databases. Spatial data are standardize by the Open Geospatial Consortium known as OGC. Spatial data helps answering many interesting questions like “Distance between two locations, area of interesting places etc.” When we think of it, it is very clear that handing spatial data and returning meaningful result is one big task when there are millions of users moving dynamically from one place to another place & requesting various spatial information. PostGIS/OpenGIS suite is very popular spatial database. It runs as a layer implementation on the RDBMS PostgreSQL. This makes it totally unique as it offers best from both the worlds. Courtesy: mushroom network Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Hive. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Additional new material WebLogic Community

- by JuergenKress

Virtual Developer Conference On Demand - Register Updated Book: WebLogic 12c: Distinctive Recipes - Architecture, Development, Administration by Oracle ACE Director Frank Munz - Blog | YouTube Webcast: Migrating from GlassFish to WebLogic - Replay Reliance Commercial Finance Accelerates Time-to-Market, Improves IT Staff Productivity by 70% - Blog | Oracle Magazine Retrieving WebLogic Server Name and Port in ADF Application by Andrejus Baranovskis, Oracle Ace Director - Blog Using Oracle WebLogic 12c with NetBeans IDEOracle ACE Director Markus Eisele walks you through installing and configuring all the necessary components, and helps you get started with a simple Hello World project. Read the article. Video: Oracle A-Team ADF Mobile Persistence SampleThis video by Oracle Fusion Middleware A-Team architect Steven Davelaar demonstrates how to use the ADF Mobile Persistence Sample JDeveloper extension to generate a fully functional ADF Mobile application that reads and writes data using an ADF BC SOAP web service. Watch the video. Java ME 8 ReleaseDownload Java ME today! This release is an implementation of the Java ME 8 standards JSR 360 (CLDC 8) and JSR 361 (MEEP 8), and includes support of alignment with Java SE 8 language features and APIs, an enhanced services-enabled application platform, the ability to "right-size" the platform to address a wide range of target devices, and more. Learn more Download Java ME SDK 8It includes application development support for Oracle Java ME Embedded 8 platforms and includes plugins for NetBeans 8. See the Java ME 8 Developer Tools Documentation to learn JavaOne 2014 Early Bird RateRegister early to save $400 off the onsite price. With the release of Java 8 this year, we have exciting new sessions and an interactive demo space! NetBeans IDE 8.0 Patch UpdateThe NetBeans Team has released a patch for NetBeans IDE 8.0. Download it today to get fixes that enhance stability and performance. Java 8 Questions ForumFor any questions about this new release, please join the conversation on the Java 8 Questions Forum. Java ME 8: Getting Started with Samples and Demo CodeLearn in few steps how to get started with Java ME 8! The New Java SE 8 FeaturesJava SE 8 introduces enhancements such as lambda expressions that enable you to write more concise yet readable code, better utilize multicore systems, and detect more errors at compile time. See What's New in JDK 8 and the new Java SE 8 documentation portal. Pay Less for Java-Related Books!Save 20% on all new Oracle Press books related to Java. Download the free preview sampler for the Java 8 book written by Herbert Schildt, Maurice Naftain, Henrik Ebbers and J.F. DiMarzio. New book: EJB 3 in Action, Second Edition WebLogic 12c Does WebSockets Getting Started by C2B2 Video: Building Robots with Java Embedded Video: Nighthacking TV Watch presentations by Stephen Chin and community members about Java SE, Java Embedded, Java EE, Hadoop, Robots and more. Migrating the Spring Pet Clinic to Java EE 7 Trip report : Jozi JUG Java Day in Johannesburg How to Build GlassFish 4 from Source 4,000 posts later : The Aquarium WebLogic Partner Community For regular information become a member in the WebLogic Partner Community please visit: http://www.oracle.com/partners/goto/wls-emea ( OPN account required). If you need support with your account please contact the Oracle Partner Business Center. Blog Twitter LinkedIn Mix Forum Wiki Technorati Tags: WebLogic,WebLogic Community,Oracle,OPN,Jürgen Kress

Read the article

Search Results

Search found 401 results on 17 pages for 'hadoop'.

Page 15/17 | < Previous Page | 11 12 13 14 15 16 17 | Next Page >

- by Irem Radzik

- by Pinal Dave

- by Lostsoul

- by Davide Mauri

- by JuergenKress

- by BuckWoody

- by Harihara Vinayakaram

- by cerd

- by Dharmavir

- by JD Long

- by Tim van Elteren

- by Simon Elliston Ball

- by Thanos Terentes Printzios

- by Pinal Dave

- by Dain C. Hansen

- by Darius Zanganeh

- by Compudicted

- by Compudicted

- by Charles Young

- by Pinal Dave

- by Pinal Dave

- by Oracle M&E

- by Pinal Dave

- by Pinal Dave

- by JuergenKress

< Previous Page | 11 12 13 14 15 16 17 | Next Page >