New Feature in ODI 11.1.1.6: ODI for Big Data
- by Julien Testut
Normal
0
false
false
false
EN-US
X-NONE
X-NONE
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Calibri","sans-serif";
mso-bidi-font-family:"Times New Roman";}
By Ananth Tirupattur
Starting with Oracle Data Integrator 11.1.1.6.0, ODI is offering
a solution to process Big Data. This
post provides an overview of this feature.
With
all the buzz around Big Data and before getting into the details of ODI for Big Data, I will provide a brief introduction to Big Data
and Oracle Solution for Big Data.
So, what is Big Data?
Big data includes:
structured data (this includes data from relation data stores, xml data stores),
semi-structured data (this includes data from
weblogs)
unstructured data (this includes data from text
blob, images)
Traditionally, business decisions are based on the
information gathered from transactional data. For example, transactional Data from CRM applications is fed to a
decision system for analysis and decision making. Products such as ODI play a key role in
enabling decision systems. However, with the emergence of massive amounts of
semi-structured and unstructured data it is important for decision system to
include them in the analysis to achieve better decision making capability.
While there is an abundance of opportunities for business
for gaining competitive advantages, process of Big Data has challenges. The challenges of processing Big Data
include:
Volume of data
Velocity of data - The high Rate at which data
is generated
Variety of data
In order to address these challenges and convert them into
opportunities, we would need an appropriate framework, platform and the right
set of tools.
Hadoop is an open source framework which is highly scalable,
fault tolerant system, for storage and processing large amounts of data. Hadoop provides 2 key services, distributed
and reliable storage called Hadoop Distributed File System or HDFS and a
framework for parallel data processing called Map-Reduce.
Innovations in Hadoop and its related technology continue to
rapidly evolve, hence therefore, it is highly recommended to follow information
on the web to keep up with latest information.
Oracle's vision is to provide a comprehensive solution to
address the challenges faced by Big Data. Oracle is providing the necessary
Hardware, software and tools for processing Big Data
Oracle solution includes:
Big Data Appliance
Oracle NoSQL Database
Cloudera
distribution for Hadoop
Oracle R
Enterprise- R is a statistical package which is very popular among data
scientists.
ODI solution for
Big Data
Oracle Loader
for Hadoop for loading data from Hadoop to Oracle.
Further details can be found here: http://www.oracle.com/us/products/database/big-data-appliance/overview/index.html
ODI Solution for Big Data:
ODI’s goal is to minimize the need to understand
the complexity of Hadoop framework and simplify the adoption of processing Big
Data seamlessly in an enterprise.
ODI is providing the capabilities for an
integrated architecture for processing Big Data. This includes capability to load data in to
Hadoop, process data in Hadoop and load data from Hadoop into Oracle.
ODI is expanding its support for Big Data by providing
the following out of the box Knowledge Modules (KMs).
IKM File to Hive (LOAD DATA).Load unstructured data from File (Local file
system or HDFS ) into Hive
IKM Hive Control AppendTransform and validate structured data on Hive
IKM Hive TransformTransform unstructured data on Hive
IKM File/Hive to Oracle (OLH)Load processed data in Hive to Oracle
RKM HiveReverse engineer Hive tables to generate models
Using the Loading KM you can map files (local and
HDFS files) to the corresponding Hive tables. For example, you can map weblog
files categorized by date into a corresponding partitioned Hive table schema.
Normal
0
false
false
false
EN-US
X-NONE
X-NONE
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Calibri","sans-serif";
mso-bidi-font-family:"Times New Roman";}
Using the Hive control Append KM you can validate and
transform data in Hive. In the below example, two source Hive tables are joined
and mapped to a target Hive table.
Normal
0
false
false
false
EN-US
X-NONE
X-NONE
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Calibri","sans-serif";
mso-bidi-font-family:"Times New Roman";}
The Hive Transform KM facilitates processing of
semi-structured data in Hive. In the below example, the data from weblog is
processed using a Perl script and mapped to target Hive table.
Normal
0
false
false
false
EN-US
X-NONE
X-NONE
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Calibri","sans-serif";
mso-bidi-font-family:"Times New Roman";}
Using the Oracle Loader for Hadoop (OLH) KM you can load
data from Hive table or HDFS to a corresponding table in Oracle. OLH is available as a standalone product. ODI
greatly enhances OLH capability by generating the configuration and mapping
files for OLH based on the configuration provided in the interface and KM options.
ODI seamlessly invokes OLH when executing the scenario. In the below example, a
HDFS file is mapped to a table in Oracle.
Development and Deployment:The following diagram illustrates the development and deployment of ODI solution for Big Data. Using the ODI Studio on your development machine create and develop ODI solution for processing Big Data by connecting to a MySQL DB or Oracle database on a BDA machine or Hadoop cluster. Schedule the ODI scenarios to be executed on the ODI agent deployed on the BDA machine or Hadoop cluster.
ODI Solution for Big Data provides several exciting new capabilities to facilitate the adoption of Big Data in an enterprise. You can find more information about the Oracle Big Data connectors on OTN.
You can find an overview of all the new features introduced in ODI 11.1.1.6 in the following document: ODI 11.1.1.6 New Features Overview