Search Results

Search found 71826 results on 2874 pages for 'master data services'.

Page 3/2874 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Big Data – Operational Databases Supporting Big Data – RDBMS and NoSQL – Day 12 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Cloud in the Big Data Story. In this article we will understand the role of Operational Databases Supporting Big Data Story. Even though we keep on talking about Big Data architecture, it is extremely crucial to understand that Big Data system can’t just exist in the isolation of itself. There are many needs of the business can only be fully filled with the help of the operational databases. Just having a system which can analysis big data may not solve every single data problem. Real World Example Think about this way, you are using Facebook and you have just updated your information about the current relationship status. In the next few seconds the same information is also reflected in the timeline of your partner as well as a few of the immediate friends. After a while you will notice that the same information is now also available to your remote friends. Later on when someone searches for all the relationship changes with their friends your change of the relationship will also show up in the same list. Now here is the question – do you think Big Data architecture is doing every single of these changes? Do you think that the immediate reflection of your relationship changes with your family member is also because of the technology used in Big Data. Actually the answer is Facebook uses MySQL to do various updates in the timeline as well as various events we do on their homepage. It is really difficult to part from the operational databases in any real world business. Now we will see a few of the examples of the operational databases. Relational Databases (This blog post) NoSQL Databases (This blog post) Key-Value Pair Databases (Tomorrow’s post) Document Databases (Tomorrow’s post) Columnar Databases (The Day After’s post) Graph Databases (The Day After’s post) Spatial Databases (The Day After’s post) Relational Databases We have earlier discussed about the RDBMS role in the Big Data’s story in detail so we will not cover it extensively over here. Relational Database is pretty much everywhere in most of the businesses which are here for many years. The importance and existence of the relational database are always going to be there as long as there are meaningful structured data around. There are many different kinds of relational databases for example Oracle, SQL Server, MySQL and many others. If you are looking for Open Source and widely accepted database, I suggest to try MySQL as that has been very popular in the last few years. I also suggest you to try out PostgreSQL as well. Besides many other essential qualities PostgreeSQL have very interesting licensing policies. PostgreSQL licenses allow modifications and distribution of the application in open or closed (source) form. One can make any modifications and can keep it private as well as well contribute to the community. I believe this one quality makes it much more interesting to use as well it will play very important role in future. Nonrelational Databases (NOSQL) We have also covered Nonrelational Dabases in earlier blog posts. NoSQL actually stands for Not Only SQL Databases. There are plenty of NoSQL databases out in the market and selecting the right one is always very challenging. Here are few of the properties which are very essential to consider when selecting the right NoSQL database for operational purpose. Data and Query Model Persistence of Data and Design Eventual Consistency Scalability Though above all of the properties are interesting to have in any NoSQL database but the one which most attracts to me is Eventual Consistency. Eventual Consistency RDBMS uses ACID (Atomicity, Consistency, Isolation, Durability) as a key mechanism for ensuring the data consistency, whereas NonRelational DBMS uses BASE for the same purpose. Base stands for Basically Available, Soft state and Eventual consistency. Eventual consistency is widely deployed in distributed systems. It is a consistency model used in distributed computing which expects unexpected often. In large distributed system, there are always various nodes joining and various nodes being removed as they are often using commodity servers. This happens either intentionally or accidentally. Even though one or more nodes are down, it is expected that entire system still functions normally. Applications should be able to do various updates as well as retrieval of the data successfully without any issue. Additionally, this also means that system is expected to return the same updated data anytime from all the functioning nodes. Irrespective of when any node is joining the system, if it is marked to hold some data it should contain the same updated data eventually. As per Wikipedia - Eventual consistency is a consistency model used in distributed computing that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. In other words -  Informally, if no additional updates are made to a given data item, all reads to that item will eventually return the same value. Tomorrow In tomorrow’s blog post we will discuss about various other Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • timetable in a jTable

    - by chandra
    I want to create a timetable in a jTable. For the top row it will display from monday to sunday and the left colume will display the time of the day with 2h interval e.g 1st colume (0000 - 0200), 2nd colume (0200 - 0400) .... And if i click a button the timing will change from 2h interval to 1h interval. I do not want to hardcode it because i need to do for 2h, 1h, 30min , 15min, 1min, 30sec and 1 sec interval and it will take too long for me to hardcode. Can anyone show me an example or help me create an example for the 2h to 1h interval so that i know what to do? The data array is for me to store data and are there any other easier or shortcuts for me to store them because if it is in 1 sec interval i got thousands of array i need to type it out. private void oneHour() //1 interval functions { if(!once) { initialize(); once = true; } jTable.setModel(new javax.swing.table.DefaultTableModel( new Object [][] { {"0000 - 0100", data[0][0], data[0][1], data[0][2], data[0][3], data[0][4], data[0][5], data[0][6]}, {"0100 - 0200", data[2][0], data[2][1], data[2][2], data[2][3], data[2][4], data[2][5], data[2][6]}, {"0200 - 0300", data[4][0], data[4][1], data[4][2], data[4][3], data[4][4], data[4][5], data[4][6]}, {"0300 - 0400", data[6][0], data[6][1], data[6][2], data[6][3], data[6][4], data[6][5], data[6][6]}, {"0400 - 0600", data[8][0], data[8][1], data[8][2], data[8][3], data[8][4], data[8][5], data[8][6]}, {"0600 - 0700", data[10][0], data[4][1], data[10][2], data[10][3], data[10][4], data[10][5], data[10][6]}, {"0700 - 0800", data[12][0], data[12][1], data[12][2], data[12][3], data[12][4], data[12][5], data[12][6]}, {"0800 - 0900", data[14][0], data[14][1], data[14][2], data[14][3], data[14][4], data[14][5], data[14][6]}, {"0900 - 1000", data[16][0], data[16][1], data[16][2], data[16][3], data[16][4], data[16][5], data[16][6]}, {"1000 - 1100", data[18][0], data[18][1], data[18][2], data[18][3], data[18][4], data[18][5], data[18][6]}, {"1100 - 1200", data[20][0], data[20][1], data[20][2], data[20][3], data[20][4], data[20][5], data[20][6]}, {"1200 - 1300", data[22][0], data[22][1], data[22][2], data[22][3], data[22][4], data[22][5], data[22][6]}, {"1300 - 1400", data[24][0], data[24][1], data[24][2], data[24][3], data[24][4], data[24][5], data[24][6]}, {"1400 - 1500", data[26][0], data[26][1], data[26][2], data[26][3], data[26][4], data[26][5], data[26][6]}, {"1500 - 1600", data[28][0], data[28][1], data[28][2], data[28][3], data[28][4], data[28][5], data[28][6]}, {"1600 - 1700", data[30][0], data[30][1], data[30][2], data[30][3], data[30][4], data[30][5], data[30][6]}, {"1700 - 1800", data[32][0], data[32][1], data[32][2], data[32][3], data[32][4], data[32][5], data[32][6]}, {"1800 - 1900", data[34][0], data[34][1], data[34][2], data[34][3], data[34][4], data[34][5], data[34][6]}, {"1900 - 2000", data[36][0], data[36][1], data[36][2], data[36][3], data[36][4], data[36][5], data[36][6]}, {"2000 - 2100", data[38][0], data[38][1], data[38][2], data[38][3], data[38][4], data[38][5], data[38][6]}, {"2100 - 2200", data[40][0], data[40][1], data[40][2], data[40][3], data[40][4], data[40][5], data[40][6]}, {"2200 - 2300", data[42][0], data[42][1], data[42][2], data[42][3], data[42][4], data[42][5], data[42][6]}, {"2300 - 2400", data[44][0], data[44][1], data[44][2], data[44][3], data[44][4], data[44][5], data[44][6]}, {"2400 - 0000", data[46][0], data[46][1], data[46][2], data[46][3], data[46][4], data[46][5], data[46][6]}, }, new String [] { "Time/Day", "(Mon)", "(Tue)", "(Wed)", "(Thurs)", "(Fri)", "(Sat)", "(Sun)" } )); } private void twoHour() //2 hour interval functions { if(!once) { initialize(); once = true; } jTable.setModel(new javax.swing.table.DefaultTableModel( new Object [][] { {"0000 - 0200", data[0][0], data[0][1], data[0][2], data[0][3], data[0][4], data[0][5], data[0][6]}, {"0200 - 0400", data[4][0], data[4][1], data[4][2], data[4][3], data[4][4], data[4][5], data[4][6]}, {"0400 - 0600", data[8][0], data[8][1], data[8][2], data[8][3], data[8][4], data[8][5], data[8][6]}, {"0600 - 0800", data[12][0], data[12][1], data[12][2], data[12][3], data[12][4], data[12][5], data[12][6]}, {"0800 - 1000", data[16][0], data[16][1], data[16][2], data[16][3], data[16][4], data[16][5], data[16][6]}, {"1000 - 1200", data[20][0], data[20][1], data[20][2], data[20][3], data[20][4], data[20][5], data[20][6]}, {"1200 - 1400", data[24][0], data[24][1], data[24][2], data[24][3], data[24][4], data[24][5], data[24][6]}, {"1400 - 1600", data[28][0], data[28][1], data[28][2], data[28][3], data[28][4], data[28][5], data[28][6]}, {"1600 - 1800", data[32][0], data[32][1], data[32][2], data[32][3], data[32][4], data[32][5], data[32][6]}, {"1800 - 2000", data[36][0], data[36][1], data[36][2], data[36][3], data[36][4], data[36][5], data[36][6]}, {"2000 - 2200", data[40][0], data[40][1], data[40][2], data[40][3], data[40][4], data[40][5], data[40][6]}, {"2200 - 2400",data[44][0], data[44][1], data[44][2], data[44][3], data[44][4], data[44][5], data[44][6]} },

    Read the article

  • Tellago && Tellago Studios 2010

    - by gsusx
    With 2011 around the corner we, at Tellago and Tellago Studios , we have been spending a lot of times evaluating our successes and failures (yes those too ;)) of 2010 and delineating some of our goals and strategies for 2011. When I look at 2010 here are some of the things that quickly jump off the page: Growing Tellago by 300% Launching a brand new company: Tellago Studios Expanding our customer base Establishing our business intelligence practice http://tellago.com/what-we-say/events/business-intelligence...(read more)

    Read the article

  • Oracle Data Mining a Star Schema: Telco Churn Case Study

    - by charlie.berger
    There is a complete and detailed Telco Churn case study "How to" Blog Series just posted by Ari Mozes, ODM Dev. Manager.  In it, Ari provides detailed guidance in how to leverage various strengths of Oracle Data Mining including the ability to: mine Star Schemas and join tables and views together to obtain a complete 360 degree view of a customer combine transactional data e.g. call record detail (CDR) data, etc. define complex data transformation, model build and model deploy analytical methodologies inside the Database  His blog is posted in a multi-part series.  Below are some opening excerpts for the first 3 blog entries.  This is an excellent resource for any novice to skilled data miner who wants to gain competitive advantage by mining their data inside the Oracle Database.  Many thanks Ari! Mining a Star Schema: Telco Churn Case Study (1 of 3) One of the strengths of Oracle Data Mining is the ability to mine star schemas with minimal effort.  Star schemas are commonly used in relational databases, and they often contain rich data with interesting patterns.  While dimension tables may contain interesting demographics, fact tables will often contain user behavior, such as phone usage or purchase patterns.  Both of these aspects - demographics and usage patterns - can provide insight into behavior.Churn is a critical problem in the telecommunications industry, and companies go to great lengths to reduce the churn of their customer base.  One case study1 describes a telecommunications scenario involving understanding, and identification of, churn, where the underlying data is present in a star schema.  That case study is a good example for demonstrating just how natural it is for Oracle Data Mining to analyze a star schema, so it will be used as the basis for this series of posts...... Mining a Star Schema: Telco Churn Case Study (2 of 3) This post will follow the transformation steps as described in the case study, but will use Oracle SQL as the means for preparing data.  Please see the previous post for background material, including links to the case study and to scripts that can be used to replicate the stages in these posts.1) Handling missing values for call data recordsThe CDR_T table records the number of phone minutes used by a customer per month and per call type (tariff).  For example, the table may contain one record corresponding to the number of peak (call type) minutes in January for a specific customer, and another record associated with international calls in March for the same customer.  This table is likely to be fairly dense (most type-month combinations for a given customer will be present) due to the coarse level of aggregation, but there may be some missing values.  Missing entries may occur for a number of reasons: the customer made no calls of a particular type in a particular month, the customer switched providers during the timeframe, or perhaps there is a data entry problem.  In the first situation, the correct interpretation of a missing entry would be to assume that the number of minutes for the type-month combination is zero.  In the other situations, it is not appropriate to assume zero, but rather derive some representative value to replace the missing entries.  The referenced case study takes the latter approach.  The data is segmented by customer and call type, and within a given customer-call type combination, an average number of minutes is computed and used as a replacement value.In SQL, we need to generate additional rows for the missing entries and populate those rows with appropriate values.  To generate the missing rows, Oracle's partition outer join feature is a perfect fit.  select cust_id, cdre.tariff, cdre.month, minsfrom cdr_t cdr partition by (cust_id) right outer join     (select distinct tariff, month from cdr_t) cdre     on (cdr.month = cdre.month and cdr.tariff = cdre.tariff);   ....... Mining a Star Schema: Telco Churn Case Study (3 of 3) Now that the "difficult" work is complete - preparing the data - we can move to building a predictive model to help identify and understand churn.The case study suggests that separate models be built for different customer segments (high, medium, low, and very low value customer groups).  To reduce the data to a single segment, a filter can be applied: create or replace view churn_data_high asselect * from churn_prep where value_band = 'HIGH'; It is simple to take a quick look at the predictive aspects of the data on a univariate basis.  While this does not capture the more complex multi-variate effects as would occur with the full-blown data mining algorithms, it can give a quick feel as to the predictive aspects of the data as well as validate the data preparation steps.  Oracle Data Mining includes a predictive analytics package which enables quick analysis. begin  dbms_predictive_analytics.explain(   'churn_data_high','churn_m6','expl_churn_tab'); end; /select * from expl_churn_tab where rank <= 5 order by rank; ATTRIBUTE_NAME       ATTRIBUTE_SUBNAME EXPLANATORY_VALUE RANK-------------------- ----------------- ----------------- ----------LOS_BAND                                      .069167052          1MINS_PER_TARIFF_MON  PEAK-5                   .034881648          2REV_PER_MON          REV-5                    .034527798          3DROPPED_CALLS                                 .028110322          4MINS_PER_TARIFF_MON  PEAK-4                   .024698149          5From the above results, it is clear that some predictors do contain information to help identify churn (explanatory value > 0).  The strongest uni-variate predictor of churn appears to be the customer's (binned) length of service.  The second strongest churn indicator appears to be the number of peak minutes used in the most recent month.  The subname column contains the interior piece of the DM_NESTED_NUMERICALS column described in the previous post.  By using the object relational approach, many related predictors are included within a single top-level column. .....   NOTE:  These are just EXCERPTS.  Click here to start reading the Oracle Data Mining a Star Schema: Telco Churn Case Study from the beginning.    

    Read the article

  • SQL Rally Pre-Con: Data Warehouse Modeling – Making the Right Choices

    - by Davide Mauri
    As you may have already learned from my old post or Adam’s or Kalen’s posts, there will be two SQL Rally in North Europe. In the Stockholm SQL Rally, with my friend Thomas Kejser, I’ll be delivering a pre-con on Data Warehouse Modeling: Data warehouses play a central role in any BI solution. It's the back end upon which everything in years to come will be created. For this reason, it must be rock solid and yet flexible at the same time. To develop such a data warehouse, you must have a clear idea of its architecture, a thorough understanding of the concepts of Measures and Dimensions, and a proven engineered way to build it so that quality and stability can go hand-in-hand with cost reduction and scalability. In this workshop, Thomas Kejser and Davide Mauri will share all the information they learned since they started working with data warehouses, giving you the guidance and tips you need to start your BI project in the best way possible?avoiding errors, making implementation effective and efficient, paving the way for a winning Agile approach, and helping you define how your team should work so that your BI solution will stand the test of time. You'll learn: Data warehouse architecture and justification Agile methodology Dimensional modeling, including Kimball vs. Inmon, SCD1/SCD2/SCD3, Junk and Degenerate Dimensions, and Huge Dimensions Best practices, naming conventions, and lessons learned Loading the data warehouse, including loading Dimensions, loading Facts (Full Load, Incremental Load, Partitioned Load) Data warehouses and Big Data (Hadoop) Unit testing Tracking historical changes and managing large sizes With all the Self-Service BI hype, Data Warehouse is become more and more central every day, since if everyone will be able to analyze data using self-service tools, it’s better for him/her to rely on correct, uniform and coherent data. Already 50 people registered from the workshop and seats are limited so don’t miss this unique opportunity to attend to this workshop that is really a unique combination of years and years of experience! http://www.sqlpass.org/sqlrally/2013/nordic/Agenda/PreconferenceSeminars.aspx See you there!

    Read the article

  • DonXml does WCF in NYC

    - by gsusx
    Tomorrow is WCF day in New York city!!!!! My good friend and Tellago's CTO Don Demsak will be doing a session WCF Data and RIA Services at the WCF fire-starter event to be hosted at the Microsoft offices in New York city. Don has a encyclopedic knowledge of both technologies and will be sharing lots of best practices learned from applying these technologies in large service oriented environments. In addition to Don, my crazy Cuban friend Miguel Castro will also be presenting three sessions at the...(read more)

    Read the article

  • Oracle Financial Analytics for SAP Certified with Oracle Data Integrator EE

    - by denis.gray
    Two days ago Oracle announced the release of Oracle Financial Analytics for SAP.  With the amount of press this has garnered in the past two days, there's a key detail that can't be missed.  This release is certified with Oracle Data Integrator EE - now making the combination of Data Integration and Business Intelligence a force to contend with.  Within the Oracle Press Release there were two important bullets: ·         Oracle Financial Analytics for SAP includes a pre-packaged ABAP code compliant adapter and is certified with Oracle Data Integrator Enterprise Edition to integrate SAP Financial Accounting data directly with the analytic application.  ·         Helping to integrate SAP financial data and disparate third-party data sources is Oracle Data Integrator Enterprise Edition which delivers fast, efficient loading and transformation of timely data into a data warehouse environment through its high-performance Extract Load and Transform (E-LT) technology. This is very exciting news, demonstrating Oracle's overall commitment to Oracle Data Integrator EE.   This is a great way to start off the new year and we look forward to building on this momentum throughout 2011.   The following links contain additional information and media responses about the Oracle Financial Analytics for SAP release. IDG News Service (Also appeared in PC World, Computer World, CIO: "Oracle is moving further into rival SAP's turf with Oracle Financial Analytics for SAP, a new BI (business intelligence) application that can crunch ERP (enterprise resource planning) system financial data for insights." Information Week: "Oracle talks a good game about the appeal of an optimized, all-Oracle stack. But the company also recognizes that we live in a predominantly heterogeneous IT world" CRN: "While some businesses with SAP Financial Accounting already use Oracle BI, those integrations had to be custom developed. The new offering provides pre-built integration capabilities." ECRM Guide:  "Among other features, Oracle Financial Analytics for SAP helps front-line managers improve financial performance and decision-making with what the company says is comprehensive, timely and role-based information on their departments' expenses and revenue contributions."   SAP Getting Started Guide for ODI on OTN: http://www.oracle.com/technetwork/middleware/data-integrator/learnmore/index.html For more information on the ODI and its SAP connectivity please review the Oracle® Fusion Middleware Application Adapters Guide for Oracle Data Integrator11g Release 1 (11.1.1)

    Read the article

  • MySQL Master - Master Broken

    - by Recc
    I've Inherited a Mysql master master system, I've noticed the second master (lets call it slave from now on as it's running on a 'slave' machine) stopped getting its db's updated. I saw that Master: Slave_IO_Running: Yes Slave_SQL_Running: Yes Slave: (with an error I truncated) Slave_IO_Running: Yes Slave_SQL_Running: No Last_Errno: 1062 Last_Error: Error 'Duplicate entry '3' for key 'PRIMARY'' on [...] I don't know what caused it to process considering we cant get duplicate there. What's important is to resume normal operations; Right now I've stop slave; on the Master and stop slave; on the Slave because I saw that if I change records on the Slave the changes Do Get Propagated to Master which is in active use. How do I: Force sync EVERYTHING from master to slave without affecting data on master? Then hopefully have slave pickup replication as usual? UPDATE OK I Tried deleting all tables on slave then it complained in that error section that the 'table' doesnt exist. So i made a no data dump of Master, and made sure I have only empty tables in Secondary (slave). I start slave; on slave BUT now it's complaining about bloody alter table statements for instance: Last_Errno: 1060 Last_Error: Error 'Duplicate column name [...] Query: 'ALTER TABLE [...] How to skip the fracking alter statements I just want to replicate the bloody data and be done with it, my tables have the lates changes already FFS and now its complaining about changes made after the replication seized weeks ago How do I reset the log or something? OUTSTANDING Why would this start happening? The "Secondary" is propagating to "Primary". "Primary" is not propagating to "Secondary". But any fixes I tried to do left it in the same state Yes-Yes Yes-No with same Last_Error. I think around that time the server was taken off the network, could that confuse MySQL in some way?

    Read the article

  • What is the definition of "Big Data"?

    - by Ben
    Is there one? All the definitions I can find describe the size, complexity / variety or velocity of the data. Wikipedia's definition is the only one I've found with an actual number Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. However, this seemingly contradicts the MIKE2.0 definition, referenced in the next paragraph, which indicates that "big" data can be small and that 100,000 sensors on an aircraft creating only 3GB of data could be considered big. IBM despite saying that: Big data is more simply than a matter of size. have emphasised size in their definition. O'Reilly has stressed "volume, velocity and variety" as well. Though explained well, and in more depth, the definition seems to be a re-hash of the others - or vice-versa of course. I think that a Computer Weekly article title sums up a number of articles fairly well "What is big data and how can it be used to gain competitive advantage". But ZDNet wins with the following from 2012: “Big Data” is a catch phrase that has been bubbling up from the high performance computing niche of the IT market... If one sits through the presentations from ten suppliers of technology, fifteen or so different definitions are likely to come forward. Each definition, of course, tends to support the need for that supplier’s products and services. Imagine that. Basically "big data" is "big" in some way shape or form. What is "big"? Is it quantifiable at the current time? If "big" is unquantifiable is there a definition that does not rely solely on generalities?

    Read the article

  • Embedded Javascript in Master Page or Pages That Use Master Page throw "Object Expected Error"

    - by Philter
    I'm semi-new to writing ASP.Net applications using master pages and I've run into an issue I've spent some time on but can't seem to solve. My situation is that I have a master page with a structure that looks like this: <head runat="server"> <title>Test Site</title> <link rel="Stylesheet" type="text/css" href="Default.css" /> <script type="text/javascript" language="javascript" src="js/Default.js" /> <meta http-equiv="Expires" content="0"/> <meta http-equiv="Cache-Control" content="no-cache"/> <meta http-equiv="Pragma" content="no-cache"/> <asp:ContentPlaceHolder ID="cphHead" runat="server"> </asp:ContentPlaceHolder> </head> <body> <form id="form1" runat="server"> <div id="divHeader"> <asp:ContentPlaceHolder ID="cphPageTitle" runat="server"></asp:ContentPlaceHolder> </div> <div id="divMainContent"> <asp:ContentPlaceHolder ID="cphMainContent" runat="server"></asp:ContentPlaceHolder> </div> </div> </form> </body> I then have a page that uses this master page that contains the following: <asp:Content ContentPlaceHolderID="cphHead" runat="server"> <script type="text/javascript" language="javascript" > function test() { alert("Hello World"); } </script> </asp:Content> <asp:Content ContentPlaceHolderID="cphMainContent" runat="server"> <fieldset> <img alt="Select As Of Date" src="Images/Calendar.png" id="aAsOfDate" class="clickable" runat="server" onclick="test();" /> <asp:Button runat="server" CssClass="buttonStyle" ID="btnSubmit" Text="Submit" OnClick="btnSubmit_Clicked"/> </fieldset> </asp:Content> When I run this page and click on the image I get an "Object Expected" error. However, if I place the test function into my Default.js external file it will function perfectly. I can't seem to figure out why this is happening. Any ideas?

    Read the article

  • Big Data – Operational Databases Supporting Big Data – Columnar, Graph and Spatial Database – Day 14 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Key-Value Pair Databases and Document Databases in the Big Data Story. In this article we will understand the role of Columnar, Graph and Spatial Database supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (The day before yesterday’s post) NoSQL Databases (The day before yesterday’s post) Key-Value Pair Databases (Yesterday’s post) Document Databases (Yesterday’s post) Columnar Databases (Tomorrow’s post) Graph Databases (Today’s post) Spatial Databases (Today’s post) Columnar Databases  Relational Database is a row store database or a row oriented database. Columnar databases are column oriented or column store databases. As we discussed earlier in Big Data we have different kinds of data and we need to store different kinds of data in the database. When we have columnar database it is very easy to do so as we can just add a new column to the columnar database. HBase is one of the most popular columnar databases. It uses Hadoop file system and MapReduce for its core data storage. However, remember this is not a good solution for every application. This is particularly good for the database where there is high volume incremental data is gathered and processed. Graph Databases For a highly interconnected data it is suitable to use Graph Database. This database has node relationship structure. Nodes and relationships contain a Key Value Pair where data is stored. The major advantage of this database is that it supports faster navigation among various relationships. For example, Facebook uses a graph database to list and demonstrate various relationships between users. Neo4J is one of the most popular open source graph database. One of the major dis-advantage of the Graph Database is that it is not possible to self-reference (self joins in the RDBMS terms) and there might be real world scenarios where this might be required and graph database does not support it. Spatial Databases  We all use Foursquare, Google+ as well Facebook Check-ins for location aware check-ins. All the location aware applications figure out the position of the phone with the help of Global Positioning System (GPS). Think about it, so many different users at different location in the world and checking-in all together. Additionally, the applications now feature reach and users are demanding more and more information from them, for example like movies, coffee shop or places see. They are all running with the help of Spatial Databases. Spatial data are standardize by the Open Geospatial Consortium known as OGC. Spatial data helps answering many interesting questions like “Distance between two locations, area of interesting places etc.” When we think of it, it is very clear that handing spatial data and returning meaningful result is one big task when there are millions of users moving dynamically from one place to another place & requesting various spatial information. PostGIS/OpenGIS suite is very popular spatial database. It runs as a layer implementation on the RDBMS PostgreSQL. This makes it totally unique as it offers best from both the worlds. Courtesy: mushroom network Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Hive. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Can't save data for a member in a data form

    - by RahulS
    Implied sharing is an old thing everyone knows the reasons and solutions of that, still little theory about that: With Essbase implied sharing, some members are shared even if you do not explicitly set them as shared. These members are implied shared members. When an implied share relationship is created, each implied member assumes the other member’s value. Essbase assumes (or implies) a shared member relationship in these situations: 1. A parent has only one child 2. A parent has only one child that consolidates to the parent In a Planning form that contains members with an implied sharing relationship, when a value is added for the parent, the child assumes the same value after the form is saved. Likewise, if a value is added for the child, the parent usually assumes the same value after a form is saved.For example, when a calculation script or load rule populates an implied share member, the other implied share member assumes the value of the member populated by the calculation script or load rule. The last value calculated or imported takes precedence. The result is the same whether you refer to the parent or the child as a variable in a calculation script. For more information have a look at: http://docs.oracle.com/cd/E17236_01/epm.1112/hp_admin_11122/ch14s11.html Now the issue which we are going to talk about is We loose data on save even when the parent is dynamic calc and has a single child. A dynamic calc parent to a single child:  If we design the form with following selection: In the data form we will find parent below the member and this is by design whenever you make a selection using commands to select all the member below parent, always children will appear before the parent: Lets try to enter data, Save it Now, try to change the way we selected members Here we go: Now the question again why this behavior: 1. Data from Planning data form passes to Essbase row by row, 2. Because in data form the child member appears before the parent, 3. First, data goes to Essbase for child (SingleStoreChild), 4. Then when Planning passes the data for parent there was #Missing or No data,  5. Over writes the data to #missing. PS: As we know that dynamic calc members are calculated on the fly they are not allocated with any memory in the Essbase, here the parent was dynamic calc and it was pointing to same memory as child in the background, when Planning was passing data to Essbase for second row it has updated the child with missing data.(Little confusing, let me know if you need more explanation) 6. As one of the solutions just change the order of appearance of parent and child. Cheers..!!! Rahul S. https://www.facebook.com/pages/HyperionPlanning/117320818374228

    Read the article

  • SQL SERVER – Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 – Introduction to DQS

    - by pinaldave
    Data Quality Services is a very important concept of SQL Server. I have recently started to explore the same and I am really learning some good concepts. Here are two very important blog posts which one should go over before continuing this blog post. Installing Data Quality Services (DQS) on SQL Server 2012 Connecting Error to Data Quality Services (DQS) on SQL Server 2012 This article is introduction to Data Quality Services for beginners. We will be using an Excel file Click on the image to enlarge the it. In the first article we learned to install DQS. In this article we will see how we can learn about building Knowledge Base and using it to help us identify the quality of the data as well help correct the bad quality of the data. Here are the two very important steps we will be learning in this tutorial. Building a New Knowledge Base  Creating a New Data Quality Project Let us start the building the Knowledge Base. Click on New Knowledge Base. In our project we will be using the Excel as a knowledge base. Here is the Excel which we will be using. There are two columns. One is Colors and another is Shade. They are independent columns and not related to each other. The point which I am trying to show is that in Column A there are unique data and in Column B there are duplicate records. Clicking on New Knowledge Base will bring up the following screen. Enter the name of the new knowledge base. Clicking NEXT will bring up following screen where it will allow to select the EXCE file and it will also let users select the source column. I have selected Colors and Shade both as a source column. Creating a domain is very important. Here you can create a unique domain or domain which is compositely build from Colors and Shade. As this is the first example, I will create unique domain – for Colors I will create domain Colors and for Shade I will create domain Shade. Here is the screen which will demonstrate how the screen will look after creating domains. Clicking NEXT it will bring you to following screen where you can do the data discovery. Clicking on the START will start the processing of the source data provided. Pre-processed data will show various information related to the source data. In our case it shows that Colors column have unique data whereas Shade have non-unique data and unique data rows are only two. In the next screen you can actually add more rows as well see the frequency of the data as the values are listed unique. Clicking next will publish the knowledge base which is just created. Now the knowledge base is created. We will try to take any random data and attempt to do DQS implementation over it. I am using another excel sheet here for simplicity purpose. In reality you can easily use SQL Server table for the same. Click on New Data Quality Project to see start DQS Project. In the next screen it will ask which knowledge base to use. We will be using our Colors knowledge base which we have recently created. In the Colors knowledge base we had two columns – 1) Colors and 2) Shade. In our case we will be using both of the mappings here. User can select one or multiple column mapping over here. Now the most important phase of the complete project. Click on Start and it will make the cleaning process and shows various results. In our case there were two columns to be processed and it completed the task with necessary information. It demonstrated that in Colors columns it has not corrected any value by itself but in Shade value there is a suggestion it has. We can train the DQS to correct values but let us keep that subject for future blog posts. Now click next and keep the domain Colors selected left side. It will demonstrate that there are two incorrect columns which it needs to be corrected. Here is the place where once corrected value will be auto-corrected in future. I manually corrected the value here and clicked on Approve radio buttons. As soon as I click on Approve buttons the rows will be disappeared from this tab and will move to Corrected Tab. If I had rejected tab it would have moved the rows to Invalid tab as well. In this screen you can see how the corrected 2 rows are demonstrated. You can click on Correct tab and see previously validated 6 rows which passed the DQS process. Now let us click on the Shade domain on the left side of the screen. This domain shows very interesting details as there DQS system guessed the correct answer as Dark with the confidence level of 77%. It is quite a high confidence level and manual observation also demonstrate that Dark is the correct answer. I clicked on Approve and the row moved to corrected tab. On the next screen DQS shows the summary of all the activities. It also demonstrates how the correction of the quality of the data was performed. The user can explore their data to a SQL Server Table, CSV file or Excel. The user also has an option to either explore data and all the associated cleansing info or data only. I will select Data only for demonstration purpose. Clicking explore will generate the files. Let us open the generated file. It will look as following and it looks pretty complete and corrected. Well, we have successfully completed DQS Process. The process is indeed very easy. I suggest you try this out yourself and you will find it very easy to learn. In future we will go over advanced concepts. Are you using this feature on your production server? If yes, would you please leave a comment with your environment and business need. It will be indeed interesting to see where it is implemented. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Business Intelligence, Data Warehousing, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Data Quality Services, DQS

    Read the article

  • SQL SERVER – Why Do We Need Master Data Management – Importance and Significance of Master Data Management (MDM)

    - by pinaldave
    Let me paint a picture of everyday life for you.  Let’s say you and your wife both have address books for your groups of friends.  There is definitely overlap between them, so that you both have the addresses for your mutual friends, and there are addresses that only you know, and some only she knows.  They also might be organized differently.  You might list your friend under “J” for “Joe” or even under “W” for “Work,” while she might list him under “S” for “Joe Smith” or under your name because he is your friend.  If you happened to trade, neither of you would be able to find anything! This is where data management would be very important.  If you were to consolidate into one address book, you would have to set rules about how to organize the book, and both of you would have to follow them.  You would also make sure that poor Joe doesn’t get entered twice under “J” and under “S.” This might be a familiar situation to you, whether you are thinking about address books, record collections, books, or even shopping lists.  Wherever there is a lot of data to consolidate, you are going to run into problems unless everyone is following the same rules. I’m sure that my readers can figure out where I am going with this.  What is SQL Server but a computerized way to organize data?  And Microsoft is making it easier and easier to get all your “addresses” into one place.  In the  2008 version of SQL they introduced a new tool called Master Data Services (MDS) for Master Data Management, and they have improved it for the new 2012 version. MDM was hailed as a major improvement for business intelligence.  You might not think that an organizational system is terribly exciting, but think about the kind of “address books” a company might have.  Many companies have lots of important information, like addresses, credit card numbers, purchase history, and so much more.  To organize all this efficiently so that customers are well cared for and properly billed (only once, not never or multiple times!) is a major part of business intelligence. MDM comes into play because it will comb through these mountains of data and make sure that all the information is consistent, accurate, and all placed in one database so that employees don’t have to search high and low and waste their time. MDM also has operational MDM functions.  This is not a redundancy.  Operational MDM means that when one employee updates one bit of information in the database, for example – updating a new address for a customer, operational MDM ensures that this address is updated throughout the system so that all departments will have the correct information. Another cool thing about MDM is that it features Master Data Services Configuration Manager, which is exactly what it sounds like.  It has a built-in “helper” that lets you set up your database quickly, easily, and with the correct configurations.  While talking about cool features, I can’t skip over the add-in for Excel.  This allows you to link certain data to Excel files for easier sharing and uploading. In summary, I want to emphasize that the scariest part of the database is slowly disappearing.  Everyone knows that a database – one consolidated area for all your data – is a good idea, but the idea of setting one up is daunting.  But SQL Server is making data management easier and easier with features like Master Data Services (MDS). Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Master Data Services, MDM

    Read the article

  • Is Data Science “Science”?

    - by BuckWoody
    I hold the term “science” in very high esteem. I grew up on the Space Coast in Florida, and eventually worked at the Kennedy Space Center, surrounded by very intelligent people who worked in various scientific fields. Recently a new term has entered the computing dialog – “Data Scientist”. Since it’s not a standard term, it has a lot of definitions, and in fact has been disputed as a correct term. After all, the reasoning goes, if there’s no such thing as “Data Science” then how can there be a Data Scientist? This argument has been made before, albeit with a different term – “Computer Science”. In Peter Denning’s excellent article “Is Computer Science Science” (April  2005/Vol. 48, No. 4 COMMUNICATIONS OF THE ACM) there are many points that separate “science” from “engineering” and even “art”.  I won’t repeat the content of that article here (I recommend you read it on your own) but will leverage the points he makes there. Definition of Science To ask the question “is data science ‘science’” then we need to start with a definition of terms. Various references put the definition into the same basic areas: Study of the physical world Systematic and/or disciplined study of a subject area ...and then they include the things studied, the bodies of knowledge and so on. The word itself comes from Latin, and means merely “to know” or “to study to know”. Greek divides knowledge further into “truth” (episteme), and practical use or effects (tekhne). Normally computing falls into the second realm. Definition of Data Science And now a more controversial definition: Data Science. This term is so new and perhaps so niche that the major dictionaries haven’t yet picked it up (my OED reference is older – can’t afford to pop for the online registration at present). Researching the term's general use I created an amalgam of the definitions this way: “Studying and applying mathematical and other techniques to derive information from complex data sets.” Using this definition, data science certainly seems to be science - it's learning about and studying some object or area using systematic methods. But implicit within the definition is the word “application”, which makes the process more akin to engineering or even technology than science. In fact, I find that using these techniques – and data itself – part of science, not science itself. I leave out the concept of studying data patterns or algorithms as part of this discipline. That is actually a domain I see within research, mathematics or computer science. That of course is a type of science, but does not seek for practical applications. As part of the argument against calling it “Data Science”, some point to the scientific method of creating a hypothesis, testing with controls, testing results against the hypothesis, and documenting for repeatability.  These are not steps that we often take in working with data. We normally start with a question, and fit patterns and algorithms to predict outcomes and find correlations. In this way Data Science is more akin to statistics (and in fact makes heavy use of them) in the process rather than starting with an assumption and following on with it. So, is Data Science “Science”? I’m uncertain – and I’m uncertain it matters. Even if we are facing rampant “title inflation” these days (does anyone introduce themselves as a secretary or supervisor anymore?) I can tolerate the term at least from the intent that we use data to study problems across a wide spectrum, rather than restricting it to a single domain. And I also understand those who have worked hard to achieve the very honorable title of “scientist” who have issues with those who borrow the term without asking. What do you think? Science, or not? Does it matter?

    Read the article

  • Data Modeling Resources

    - by Dejan Sarka
    You can find many different data modeling resources. It is impossible to list all of them. I selected only the most valuable ones for me, and, of course, the ones I contributed to. Books Chris J. Date: An Introduction to Database Systems – IMO a “must” to understand the relational model correctly. Terry Halpin, Tony Morgan: Information Modeling and Relational Databases – meet the object-role modeling leaders. Chris J. Date, Nikos Lorentzos and Hugh Darwen: Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL – all theory needed to manage temporal data. Louis Davidson, Jessica M. Moss: Pro SQL Server 2012 Relational Database Design and Implementation – the best SQL Server focused data modeling book I know by two of my friends. Dejan Sarka, et al.: MCITP Self-Paced Training Kit (Exam 70-441): Designing Database Solutions by Using Microsoft® SQL Server™ 2005 – SQL Server 2005 data modeling training kit. Most of the text is still valid for SQL Server 2008, 2008 R2, 2012 and 2014. Itzik Ben-Gan, Lubor Kollar, Dejan Sarka, Steve Kass: Inside Microsoft SQL Server 2008 T-SQL Querying – Steve wrote a chapter with mathematical background, and I added a chapter with theoretical introduction to the relational model. Itzik Ben-Gan, Dejan Sarka, Roger Wolter, Greg Low, Ed Katibah, Isaac Kunen: Inside Microsoft SQL Server 2008 T-SQL Programming – I added three chapters with theoretical introduction and practical solutions for the user-defined data types, dynamic schema and temporal data. Dejan Sarka, Matija Lah, Grega Jerkic: Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft SQL Server 2012 – my first two chapters are about data warehouse design and implementation. Courses Data Modeling Essentials – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. Logical and Physical Modeling for Analytical Applications – online course I wrote for Pluralsight. Working with Temporal data in SQL Server – my latest Pluralsight course, where besides theory and implementation I introduce many original ways how to optimize temporal queries. Forthcoming presentations SQL Bits 12, July 17th – 19th, Telford, UK – I have a full-day pre-conference seminar Advanced Data Modeling Topics there.

    Read the article

  • Azure Mobile Services: lessons learned

    - by svdoever
    When I first started using Azure Mobile Services I thought of it as a nice way to: authenticate my users - login using Twitter, Google, Facebook, Windows Live create tables, and use the client code to create the columns in the table because that is not possible in the Azure Mobile Services UI run some Javascript code on the table crud actions (Insert, Update, Delete, Read) schedule a Javascript to run any 15 or more minutes I had no idea of the magic that was happening inside… where is the data stored? Is it a kind of big table, are relationships between tables possible? those Javascripts on the table crud actions, is that interpreted, what is that exactly? After working for some time with Azure Mobile Services I became a lot wiser: Those tables are just normal tables in an Azure SQL Server 2012 Creating the table columns through client code sucks, at least from my Javascript code, because the columns are deducted from the sent JSON data, and a datetime field is sent as string in JSON, so a string type column is created instead of a datetime column You can connect with SQL Management Studio to the Azure SQL Server, and although you can’t manage your columns through the SQL Management Studio UI, it is possible to just run SQL scripts to drop and create tables and indices When you create a table through SQL script, add the table with the same name in the Azure Mobile Services UI to hook it up and be able to access the table through the provided abstraction layer You can also go to the SQL Database through the Azure Mobile Services UI, and from there get in a web based SQL management studio where you can create columns and manage your data The table crud scripts and the scheduler scripts are full blown node.js scripts, introducing a lot of power with great performance The web based script editor is really powerful, I do most of my editing currently in the editor which has syntax highlighting and code completing. While editing the code JsHint is used for script validation. The documentation on Azure Mobile Services is… suboptimal. It is such a pity that there is no way to comment on it so the community could fill in the missing holes, like which node modules are already loaded, and which modules are available on Azure Mobile Services. Soon I was hacking away on Azure Mobile Services, creating my own database tables through script, and abusing the read script of an empty table named query to implement my own set of “services”. The latest updates to Azure Mobile Services described in the following posts added some great new features like creating web API’s, use shared code from your scripts, command line tools for managing Azure Mobile Services (upload and download scripts for example), support for node modules and git support: http://weblogs.asp.net/scottgu/archive/2013/06/14/windows-azure-major-updates-for-mobile-backend-development.aspx http://blogs.msdn.com/b/carlosfigueira/archive/2013/06/14/custom-apis-in-azure-mobile-services.aspx http://blogs.msdn.com/b/carlosfigueira/archive/2013/06/19/custom-api-in-azure-mobile-services-client-sdks.aspx In the mean time I rewrote all my “service-like” table scripts to API scripts, which works like a breeze. Bad thing with the current state of Azure Mobile Services is that the git support is not working if you are a co-administrator of your Azure subscription, and not and administrator (as in my case). Another bad thing is that Cross Origin Request Sharing (CORS) is not supported for the API yet, so no go yet from the browser client for API’s, which is my case. See http://social.msdn.microsoft.com/Forums/windowsazure/en-US/2b79c5ea-d187-4c2b-823a-3f3e0559829d/known-limitations-for-source-control-and-custom-api-features for more on these and other limitations. In his talk at Build 2013 Josh Twist showed that there is a work-around for accessing shared script code from the table scripts as well (another limitation mentioned in the post above). I could not find that code in the Votabl2 code example from the presentation at https://github.com/joshtwist/votabl2, but we can grab it from the presentation when it comes online on Channel9. By the way: you can always express your needs and ideas at http://mobileservices.uservoice.com, that’s the place they are listening to (I hope!).

    Read the article

  • Deploying Data Mining Models using Model Export and Import

    - by [email protected]
    In this post, we'll take a look at how Oracle Data Mining facilitates model deployment. After building and testing models, a next step is often putting your data mining model into a production system -- referred to as model deployment. The ability to move data mining model(s) easily into a production system can greatly speed model deployment, and reduce the overall cost. Since Oracle Data Mining provides models as first class database objects, models can be manipulated using familiar database techniques and technology. For example, one or more models can be exported to a flat file, similar to a database table dump file (.dmp). This file can be moved to a different instance of Oracle Database EE, and then imported. All methods for exporting and importing models are based on Oracle Data Pump technology and found in the DBMS_DATA_MINING package. Before performing the actual export or import, a directory object must be created. A directory object is a logical name in the database for a physical directory on the host computer. Read/write access to a directory object is necessary to access the host computer file system from within Oracle Database. For our example, we'll work in the DMUSER schema. First, DMUSER requires the privilege to create any directory. This is often granted through the sysdba account. grant create any directory to dmuser; Now, DMUSER can create the directory object specifying the path where the exported model file (.dmp) should be placed. In this case, on a linux machine, we have the directory /scratch/oracle. CREATE OR REPLACE DIRECTORY dmdir AS '/scratch/oracle'; If you aren't sure of the exact name of the model or models to export, you can find the list of models using the following query: select model_name from user_mining_models; There are several options when exporting models. We can export a single model, multiple models, or all models in a schema using the following procedure calls: BEGIN   DBMS_DATA_MINING.EXPORT_MODEL ('MY_MODEL.dmp','dmdir','name =''MY_DT_MODEL'''); END; BEGIN   DBMS_DATA_MINING.EXPORT_MODEL ('MY_MODELS.dmp','dmdir',              'name IN (''MY_DT_MODEL'',''MY_KM_MODEL'')'); END; BEGIN   DBMS_DATA_MINING.EXPORT_MODEL ('ALL_DMUSER_MODELS.dmp','dmdir'); END; A .dmp file can be imported into another schema or database using the following procedure call, for example: BEGIN   DBMS_DATA_MINING.IMPORT_MODEL('MY_MODELS.dmp', 'dmdir'); END; As with models from any data mining tool, when moving a model from one environment to another, care needs to be taken to ensure the transformations that prepare the data for model building are matched (with appropriate parameters and statistics) in the system where the model is deployed. Oracle Data Mining provides automatic data preparation (ADP) and embedded data preparation (EDP) to reduce, or possibly eliminate, the need to explicitly transport transformations with the model. In the case of ADP, ODM automatically prepares the data and includes the necessary transformations in the model itself. In the case of EDP, users can associate their own transformations with attributes of a model. These transformations are automatically applied when applying the model to data, i.e., scoring. Exporting and importing a model with ADP or EDP results in these transformations being immediately available with the model in the production system.

    Read the article

  • Setting up a new Silverlight 4 Project with WCF RIA Services

    - by Kevin Grossnicklaus
    Many of my clients are actively using Silverlight 4 and RIA Services to build powerful line of business applications.  Getting things set up correctly is critical to being to being able to take full advantage of the RIA services plumbing and when developers struggle with the setup they tend to shy away from the solution as a whole.  I’m a big proponent of RIA services and wanted to take the opportunity to share some of my experiences in setting up these types of projects.  In late 2010 I presented a RIA Services Master Class here in St. Louis, MO through my firm (ArchitectNow) and the information shared in this post was promised during that presentation. One other thing I want to mention before diving in is the existence of a number of other great posts on this subject.  I’ve learned a lot from many of them and wanted to call out a few of them.  The purpose of my post is to point out some of the gotchas that people get caught up on in the process but I would still encourage you to do as much additional research as you can to find the perfect setup for your needs. Here are a few additional blog posts and articles you should check out on the subject: http://msdn.microsoft.com/en-us/library/ee707351(VS.91).aspx http://adam-thompson.com/post/2010/07/03/Getting-Started-with-WCF-RIA-Services-for-Silverlight-4.aspx Technologies I don’t intend for this post to turn into a full WCF RIA Services tutorial but I did want to point out what technologies we will be using: Visual Studio.NET 2010 Silverlight 4.0 WCF RIA Services for Visual Studio 2010 Entity Framework 4.0 I also wanted to point out that the screenshots came from my personal development box which has a number of additional plug-ins and frameworks loaded so a few of the screenshots might not match 100% with what you see on your own machines. If you do not have Visual Studio 2010 you can download the express version from http://www.microsoft.com/express.  The Silverlight 4.0 tools and the WCF RIA Services components are installed via the Web Platform Installer (http://www.microsoft.com/web/download). Also, the examples given in this post are done in C#…sorry to you VB folks but the concepts are 100% identical. Setting up anew RIA Services Project This section will provide a step-by-step walkthrough of setting up a new RIA services project using a shared DLL for server side code and a simple Entity Framework model for data access.  All projects are created with the consistent ArchitectNow.RIAServices filename prefix and default namespace.  This would be modified to match your companies standards. First, open Visual Studio and open the new project window via File->New->Project.  In the New Project window, select the Silverlight folder in the Installed Templates section on the left and select “Silverlight Application” as your project type.  Verify your solution name and location are set appropriately.  Note that the project name we specified in the example below ends with .Client.  This indicates the name which will be given to our Silverlight project. I consider Silverlight a client-side technology and thus use this name to reflect that.  Click Ok to continue. During the creation on a new Silverlight 4 project you will be prompted with the following dialog to create a new web ASP.NET web project to host your Silverlight content.  As we are demonstrating the setup of a WCF RIA Services infrastructure, make sure the “Enable WCF RIA Services” option is checked and click OK.  Obviously, there are some other options here which have an effect on your solution and you are welcome to look around.  For our example we are going to leave the ASP.NET Web Application Project selected.  If you are interested in having your Silverlight project hosted in an MVC 2 application or a Web Site project these options are available as well.  Also, whichever web project type you select, the name can be modified here as well.  Note that it defaults to the same name as your Silverlight project with the addition of a .Web suffix. At this point, your full Silverlight 4 project and host ASP.NET Web Application should be created and will now display in your Visual Studio solution explorer as part of a single Visual Studio solution as follows: Now we want to add our WCF RIA Services projects to this same solution.  To do so, right-click on the Solution node in the solution explorer and select Add->New Project.  In the New Project dialog again select the Silverlight folder under the Visual C# node on the left and, in the main area of the screen, select the WCF RIA Services Class Library project template as shown below.  Make sure your project name is set appropriately as well.  For the sample below, we will name the project “ArchitectNow.RIAServices.Server.Entities”.   The .Server.Entities suffix we use is meant to simply indicate that this particular project will contain our WCF RIA Services entity classes (as you will see below).  Click OK to continue. Once you have created the WCF RIA Services Class Library specified above, Visual Studio will automatically add TWO projects to your solution.  The first will be an project called .Server.Entities (using our naming conventions) and the other will have the same name with a .Web extension.  The full solution (with all 4 projects) is shown in the image below.  The .Entities project will essentially remain empty and is actually a Silverlight 4 class library that will contain generated RIA Services domain objects.  It will be referenced by our front-end Silverlight project and thus allow for simplified sharing of code between the client and the server.   The .Entities.Web project is a .NET 4.0 class library into which we will put our data access code (via Entity Framework).  This is our server side code and business logic and the RIA Services plumbing will maintain a link between this project and the front end.  Specific entities such as our domain objects and other code we set to be shared will be copied automatically into the .Entities project to be used in both the front end and the back end. At this point, we want to do a little cleanup of the projects in our solution and we will do so by deleting the “Class1.cs” class from both the .Entities project and the .Entities.Web project.  (Has anyone ever intentionally named a class “Class1”?) Next, we need to configure a few references to make RIA Services work.  THIS IS A KEY STEP THAT CAUSES MANY HEADACHES FOR DEVELOPERS NEW TO THIS INFRASTRUCTURE! Using the Add References dialog in Visual Studio, add a project reference from the *.Client project (our Silverlight 4 client) to the *.Entities project (our RIA Services class library).  Next, again using the Add References dialog in Visual Studio, add a project reference from the *.Client.Web project (our ASP.NET host project) to the *.Entities.Web project (our back-end data services DLL).  To get to the Add References dialog, simply right-click on the project you with to add a reference to in the Visual Studio solution explorer and select “Add Reference” from the resulting context menu.  You will want to make sure these references are added as “Project” references to simplify your future debugging.  To reiterate the reference direction using the project names we have utilized in this example thus far:  .Client references .Entities and .Client.Web reference .Entities.Web.  If you have opted for a different naming convention, then the Silverlight project must reference the RIA Services Silverlight class library and the ASP.NET host project must reference the server-side class library. Next, we are going to add a new Entity Framework data model to our data services project (.Entities.Web).  We will do this by right clicking on this project (ArchitectNow.Server.Entities.Web in the above diagram) and selecting Add->New Project.  In the New Project dialog we will select ADO.NET Entity Data Model as in the following diagram.  For now we will call this simply SampleDataModel.edmx and click OK. It is worth pointing out that WCF RIA Services is in no way tied to the Entity Framework as a means of accessing data and any data access technology is supported (as long as the server side implementation maps to the RIA Services pattern which is a topic beyond the scope of this post).  We are using EF to quickly demonstrate the RIA Services concepts and setup infrastructure, as such, I am not providing a database schema with this post but am instead connecting to a small sample database on my local machine.  The following diagram shows a simple EF Data Model with two tables that I reverse engineered from a local data store.   If you are putting together your own solution, feel free to reverse engineer a few tables from any local database to which you have access. At this point, once you have an EF data model generated as an EDMX into your .Entites.Web project YOU MUST BUILD YOUR SOLUTION.  I know it seems strange to call that out but it important that the solution be built at this point for the next step to be successful.  Obviously, if you have any build errors, these must be addressed at this point. At this point we will add a RIA Services Domain Service to our .Entities.Web project (our server side code).  We will need to right-click on the .Entities.Web project and select Add->New Item.  In the Add New Item dialog, select Domain Service Class and verify the name of your new Domain Service is correct (ours is called SampleService.cs in the image below).  Next, click "Add”. After clicking “Add” to include the Domain Service Class in the selected project, you will be presented with the following dialog.  In it, you can choose which entities from the selected EDMX to include in your services and if they should be allowed to be edited (i.e. inserted, updated, or deleted) via this service.  If the “Available DataContext/ObjectContext classes” dropdown is empty, this indicates you have not yes successfully built your project after adding your EDMX.  I would also recommend verifying that the “Generate associated classes for metadata” option is selected.  Once you have selected the appropriate options, click “OK”. Once you have added the domain service class to the .Entities.Web project, the resulting solution should look similar to the following: Note that in the solution you now have a SampleDataModel.edmx which represents your EF data mapping to your database and a SampleService.cs which will contain a large amount of generated RIA Services code which RIA Services utilizes to access this data from the Silverlight front-end.  You will put all your server side data access code and logic into the SampleService.cs class.  The SampleService.metadata.cs class is for decorating the generated domain objects with attributes from the System.ComponentModel.DataAnnotations namespace for validation purposes. FINAL AND KEY CONFIGURATION STEP!  One key step that causes significant headache to developers configuring RIA Services for the first time is the fact that, when we added the EDMX to the .Entities.Web project for our EF data access, a connection string was generated and placed within a newly generated App.Context file within that project.  While we didn’t point it out at the time you can see it in the image above.  This connection string will be required for the EF data model to successfully locate it’s data.  Also, when we added the Domain Service class to the .Entities.Web project, a number of RIA Services configuration options were added to the same App.Config file.   Unfortunately, when we ultimately begin to utilize the RIA Services infrastructure, our Silverlight UI will be making RIA services calls through the ASP.NET host project (i.e. .Client.Web).  This host project has a reference to the .Entities.Web project which actually contains the code so all will pass through correctly EXCEPT the fact that the host project will utilize it’s own Web.Config for any configuration settings.  For this reason we must now merge all the sections of the App.Config file in the .Entities.Web project into the Web.Config file in the .Client.Web project.  I know this is a bit tedious and I wish there were a simpler solution but it is required for our RIA Services Domain Service to be made available to the front end Silverlight project.  Much of this manual merge can be achieved by simply cutting and pasting from App.Config into Web.Config.  Unfortunately, the <system.webServer> section will exist in both and the contents of this section will need to be manually merged.  Fortunately, this is a step that needs to be taken only once per solution.  As you add additional data structures and Domain Services methods to the server no additional changes will be necessary to the Web.Config. Next Steps At this point, we have walked through the basic setup of a simple RIA services solution.  Unfortunately, there is still a lot to know about RIA services and we have not even begun to take advantage of the plumbing which we just configured (meaning we haven’t even made a single RIA services call).  I plan on posting a few more introductory posts over the next few weeks to take us to this step.  If you have any questions on the content in this post feel free to reach out to me via this Blog and I’ll gladly point you in (hopefully) the right direction. Resources Prior to closing out this post, I wanted to share a number or resources to help you get started with RIA services.  While I plan on posting more on the subject, I didn’t invent any of this stuff and wanted to give credit to the following areas for helping me put a lot of these pieces into place.   The books and online resources below will go a long way to making you extremely productive with RIA services in the shortest time possible.  The only thing required of you is the dedication to take advantage of the resources available. Books Pro Business Applications with Silverlight 4 http://www.amazon.com/Pro-Business-Applications-Silverlight-4/dp/1430272074/ref=sr_1_2?ie=UTF8&qid=1291048751&sr=8-2 Silverlight 4 in Action http://www.amazon.com/Silverlight-4-Action-Pete-Brown/dp/1935182374/ref=sr_1_1?ie=UTF8&qid=1291048751&sr=8-1 Pro Silverlight for the Enterprise (Books for Professionals by Professionals) http://www.amazon.com/Pro-Silverlight-Enterprise-Books-Professionals/dp/1430218673/ref=sr_1_3?ie=UTF8&qid=1291048751&sr=8-3 Web Content RIA Services http://channel9.msdn.com/Blogs/RobBagby/NET-RIA-Services-in-5-Minutes http://silverlight.net/riaservices/ http://www.silverlight.net/learn/videos/all/net-ria-services-intro/ http://www.silverlight.net/learn/videos/all/ria-services-support-visual-studio-2010/ http://channel9.msdn.com/learn/courses/Silverlight4/SL4BusinessModule2/SL4LOB_02_01_RIAServices http://www.myvbprof.com/MainSite/index.aspx#/zSL4_RIA_01 http://channel9.msdn.com/blogs/egibson/silverlight-firestarter-ria-services http://msdn.microsoft.com/en-us/library/ee707336%28v=VS.91%29.aspx Silverlight www.silverlight.net http://msdn.microsoft.com/en-us/silverlight4trainingcourse.aspx http://channel9.msdn.com/shows/silverlighttv

    Read the article

  • Exploring the Excel Services REST API

    - by jamiet
    Over the last few years Analysis Services guru Chris Webb and I have been on something of a crusade to enable better access to data that is locked up in countless Excel workbooks that litter the hard drives of enterprise PCs. The most prominent manifestation of that crusade up to now has been a forum thread that Chris began on Microsoft Answers entitled Excel Web App API? Chris began that thread with: I was wondering whether there was an API for the Excel Web App? Specifically, I was wondering if it was possible (or if it will be possible in the future) to expose data in a spreadsheet in the Excel Web App as an OData feed, in the way that it is possible with Excel Services? Up to recently the last 10 words of that paragraph "in the way that it is possible with Excel Services" had completely washed over me however a comment on my recent blog post Thoughts on ExcelMashup.com (and a rant) by Josh Booker in which Josh said: Excel Services is a service application built for sharepoint 2010 which exposes a REST API for excel documents. We're looking forward to pros like you giving it a try now that Office365 makes sharepoint more easily accessible.  Can't wait for your future blog about using REST API to load data from Excel on Offce 365 in SSIS. made me think that perhaps the Excel Services REST API is something I should be looking into and indeed that is what I have been doing over the past few days. And you know what? I'm rather impressed with some of what Excel Services' REST API has to offer. Unfortunately Excel Services' REST API also has one debilitating aspect that renders this blog post much less useful than it otherwise would be; namely that it is not publicly available from the Excel Web App on SkyDrive. Therefore all I can do in this blog post is show you screenshots of what the REST API provides in Sharepoint rather than linking you directly to those REST resources; that's a great shame because one of the benefits of a REST API is that it is easily and ubiquitously demonstrable from a web browser. Instead I am hosting a workbook on Sharepoint in Office 365 because that does include Excel Services' REST API but, again, all I can do is show you screenshots. N.B. If anyone out there knows how to make Office-365-hosted spreadsheets publicly-accessible (i.e. without requiring a username/password) please do let me know (because knowing which forum on which to ask the question is an exercise in futility). In order to demonstrate Excel Services' REST API I needed some decent data and for that I used the World Tourism Organization Statistics Database and Yearbook - United Nations World Tourism Organization dataset hosted on Azure Datamarket (its free, by the way); this dataset "provides comprehensive information on international tourism worldwide and offers a selection of the latest available statistics on international tourist arrivals, tourism receipts and expenditure" and you can explore the data for yourself here. If you want to play along at home by viewing the data as it exists in Excel then it can be viewed here. Let's dive in.   The root of Excel Services' REST API is the model resource which resides at: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model Note that this is true for every workbook hosted in a Sharepoint document library - each Excel workbook is a RESTful resource. (Update: Mark Stacey on Twitter tells me that "It's turned off by default in onpremise Sharepoint (1 tickbox to turn on though)". Thanks Mark!) The data is provided as an ATOM feed but I have Firefox's feed reading ability turned on so you don't see the underlying XML goo. As you can see there are four top level resources, Ranges, Charts, Tables and PivotTables; exploring one of those resources is where things get interesting. Let's take a look at the Tables Resource: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables Our workbook contains only one table, called ‘Table1’ (to reiterate, you can explore this table yourself here). Viewing that table via the REST API is pretty easy, we simply append the name of the table onto our previous URI: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1') As you can see, that quite simply gives us a representation of the data in that table. What you cannot see from this screenshot is that this is pure HTML that is being served up; that is all well and good but actually we can do more interesting things. If we specify that the data should be returned not as HTML but as: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1')?$format=image then that data comes back as a pure image and can be used in any web page where you would ordinarily use images. This is the thing that I really like about Excel Services’ REST API – we can embed an image in any web page but instead of being a copy of the data, that image is actually live – if the underlying data in the workbook were to change then hitting refresh will show a new image. Pretty cool, no? The same is true of any Charts or Pivot Tables in your workbook - those can be embedded as images too and if the underlying data changes, boom, the image in your web page changes too. There is a lot of data in the workbook so the image returned by that previous URI is too large to show here so instead let’s take a look at a different resource, this time a range: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15') That URI returns cells A1 to C15 from a worksheet called “Data”: And if we ask for that as an image again: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15')?$format=image Were this image resource not behind a username/password then this would be a live image of the data in the workbook as opposed to one that I had to copy and upload elsewhere. Nonetheless I hope this little wrinkle doesn't detract from the inate value of what I am trying to articulate here; that an existing image in a web page can be changed on-the-fly simply by inserting some data into an Excel workbook. I for one think that that is very cool indeed! I think that's enough in the way of demo for now as this shows what is possible using Excel Services' REST API. Of course, not all features work quite how I would like and here is a bulleted list of some of my more negative feedback: The URIs are pig-ugly. Are "_vti_bin" & "ExcelRest.aspx" really necessary as part of the URI? Would this not be better: http://server/Documents/TourismExpenditureInMillionsOfUSD.xlsx/Model/Tables(‘Table1’) That URI provides the necessary addressability and is a lot easier to remember. Discoverability of these resources is not easy, we essentially have to handcrank a URI ourselves. Take the example of embedding a chart into a blog post - would it not be better if I could browse first through the document library to an Excel workbook and THEN through the workbook to the chart/range/table that I am interested in? Call it a wizard if you like. That would be really cool and would, I am sure, promote this feature and cut down on the copy-and-paste disease that the REST API is meant to alleviate. The resources that I demonstrated can be returned as feeds as well as images or HTML simply by changing the format parameter to ?$format=atom however for some inexplicable reason they don't return OData and no-one on the Excel Services team can tell me why (believe me, I have asked). $format is an OData parameter however other useful parameters such as $top and $filter are not supported. It would be nice if they were. Although I haven't demonstrated it here Excel Services' REST API does provide a makeshift way of altering the data by changing the value of specific cells however what it does not allow you to do is add new data into the workbook. Google Docs allows this and was one of the motivating factors for Chris Webb's forum post that I linked to above. None of this works for Excel workbooks hosted on SkyDrive This blog post is as long as it needs to be for a short introduction so I'll stop now. If you want to know more than I recommend checking out a few links: Excel Services REST API documentation on MSDNSo what does REST on Excel Services look like??? by Shahar PrishExcel Services in SharePoint 2010 REST API Syntax by Christian Stich. Any thoughts? Let's hear them in the comments section below! @Jamiet 

    Read the article

  • Exploring the Excel Services REST API

    - by jamiet
    Over the last few years Analysis Services guru Chris Webb and I have been on something of a crusade to enable better access to data that is locked up in countless Excel workbooks that litter the hard drives of enterprise PCs. The most prominent manifestation of that crusade up to now has been a forum thread that Chris began on Microsoft Answers entitled Excel Web App API? Chris began that thread with: I was wondering whether there was an API for the Excel Web App? Specifically, I was wondering if it was possible (or if it will be possible in the future) to expose data in a spreadsheet in the Excel Web App as an OData feed, in the way that it is possible with Excel Services? Up to recently the last 10 words of that paragraph "in the way that it is possible with Excel Services" had completely washed over me however a comment on my recent blog post Thoughts on ExcelMashup.com (and a rant) by Josh Booker in which Josh said: Excel Services is a service application built for sharepoint 2010 which exposes a REST API for excel documents. We're looking forward to pros like you giving it a try now that Office365 makes sharepoint more easily accessible.  Can't wait for your future blog about using REST API to load data from Excel on Offce 365 in SSIS. made me think that perhaps the Excel Services REST API is something I should be looking into and indeed that is what I have been doing over the past few days. And you know what? I'm rather impressed with some of what Excel Services' REST API has to offer. Unfortunately Excel Services' REST API also has one debilitating aspect that renders this blog post much less useful than it otherwise would be; namely that it is not publicly available from the Excel Web App on SkyDrive. Therefore all I can do in this blog post is show you screenshots of what the REST API provides in Sharepoint rather than linking you directly to those REST resources; that's a great shame because one of the benefits of a REST API is that it is easily and ubiquitously demonstrable from a web browser. Instead I am hosting a workbook on Sharepoint in Office 365 because that does include Excel Services' REST API but, again, all I can do is show you screenshots. N.B. If anyone out there knows how to make Office-365-hosted spreadsheets publicly-accessible (i.e. without requiring a username/password) please do let me know (because knowing which forum on which to ask the question is an exercise in futility). In order to demonstrate Excel Services' REST API I needed some decent data and for that I used the World Tourism Organization Statistics Database and Yearbook - United Nations World Tourism Organization dataset hosted on Azure Datamarket (its free, by the way); this dataset "provides comprehensive information on international tourism worldwide and offers a selection of the latest available statistics on international tourist arrivals, tourism receipts and expenditure" and you can explore the data for yourself here. If you want to play along at home by viewing the data as it exists in Excel then it can be viewed here. Let's dive in.   The root of Excel Services' REST API is the model resource which resides at: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model Note that this is true for every workbook hosted in a Sharepoint document library - each Excel workbook is a RESTful resource. (Update: Mark Stacey on Twitter tells me that "It's turned off by default in onpremise Sharepoint (1 tickbox to turn on though)". Thanks Mark!) The data is provided as an ATOM feed but I have Firefox's feed reading ability turned on so you don't see the underlying XML goo. As you can see there are four top level resources, Ranges, Charts, Tables and PivotTables; exploring one of those resources is where things get interesting. Let's take a look at the Tables Resource: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables Our workbook contains only one table, called ‘Table1’ (to reiterate, you can explore this table yourself here). Viewing that table via the REST API is pretty easy, we simply append the name of the table onto our previous URI: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1') As you can see, that quite simply gives us a representation of the data in that table. What you cannot see from this screenshot is that this is pure HTML that is being served up; that is all well and good but actually we can do more interesting things. If we specify that the data should be returned not as HTML but as: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1')?$format=image then that data comes back as a pure image and can be used in any web page where you would ordinarily use images. This is the thing that I really like about Excel Services’ REST API – we can embed an image in any web page but instead of being a copy of the data, that image is actually live – if the underlying data in the workbook were to change then hitting refresh will show a new image. Pretty cool, no? The same is true of any Charts or Pivot Tables in your workbook - those can be embedded as images too and if the underlying data changes, boom, the image in your web page changes too. There is a lot of data in the workbook so the image returned by that previous URI is too large to show here so instead let’s take a look at a different resource, this time a range: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15') That URI returns cells A1 to C15 from a worksheet called “Data”: And if we ask for that as an image again: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15')?$format=image Were this image resource not behind a username/password then this would be a live image of the data in the workbook as opposed to one that I had to copy and upload elsewhere. Nonetheless I hope this little wrinkle doesn't detract from the inate value of what I am trying to articulate here; that an existing image in a web page can be changed on-the-fly simply by inserting some data into an Excel workbook. I for one think that that is very cool indeed! I think that's enough in the way of demo for now as this shows what is possible using Excel Services' REST API. Of course, not all features work quite how I would like and here is a bulleted list of some of my more negative feedback: The URIs are pig-ugly. Are "_vti_bin" & "ExcelRest.aspx" really necessary as part of the URI? Would this not be better: http://server/Documents/TourismExpenditureInMillionsOfUSD.xlsx/Model/Tables(‘Table1’) That URI provides the necessary addressability and is a lot easier to remember. Discoverability of these resources is not easy, we essentially have to handcrank a URI ourselves. Take the example of embedding a chart into a blog post - would it not be better if I could browse first through the document library to an Excel workbook and THEN through the workbook to the chart/range/table that I am interested in? Call it a wizard if you like. That would be really cool and would, I am sure, promote this feature and cut down on the copy-and-paste disease that the REST API is meant to alleviate. The resources that I demonstrated can be returned as feeds as well as images or HTML simply by changing the format parameter to ?$format=atom however for some inexplicable reason they don't return OData and no-one on the Excel Services team can tell me why (believe me, I have asked). $format is an OData parameter however other useful parameters such as $top and $filter are not supported. It would be nice if they were. Although I haven't demonstrated it here Excel Services' REST API does provide a makeshift way of altering the data by changing the value of specific cells however what it does not allow you to do is add new data into the workbook. Google Docs allows this and was one of the motivating factors for Chris Webb's forum post that I linked to above. None of this works for Excel workbooks hosted on SkyDrive This blog post is as long as it needs to be for a short introduction so I'll stop now. If you want to know more than I recommend checking out a few links: Excel Services REST API documentation on MSDNSo what does REST on Excel Services look like??? by Shahar PrishExcel Services in SharePoint 2010 REST API Syntax by Christian Stich. Any thoughts? Let's hear them in the comments section below! @Jamiet 

    Read the article

  • Why Oracle Data Integrator for Big Data?

    - by Mala Narasimharajan
    Big Data is everywhere these days - but what exactly is it? It’s data that comes from a multitude of sources – not only structured data, but unstructured data as well.  The sheer volume of data is mindboggling – here are a few examples of big data: climate information collected from sensors, social media information, digital pictures, log files, online video files, medical records or online transaction records.  These are just a few examples of what constitutes big data.   Embedded in big data is tremendous value and being able to manipulate, load, transform and analyze big data is key to enhancing productivity and competitiveness.  The value of big data lies in its propensity for greater in-depth analysis and data segmentation -- in turn giving companies detailed information on product performance, customer preferences and inventory.  Furthermore, by being able to store and create more data in digital form, “big data can unlock significant value by making information transparent and usable at much higher frequency." (McKinsey Global Institute, May 2011) Oracle's flagship product for bulk data movement and transformation, Oracle Data Integrator, is a critical component of Oracle’s Big Data strategy. ODI provides automation, bulk loading, and validation and transformation capabilities for Big Data while minimizing the complexities of using Hadoop.  Specifically, the advantages of ODI in a Big Data scenario are due to pre-built Knowledge Modules that drive processing in Hadoop. This leverages the graphical UI to load and unload data from Hadoop, perform data validations and create mapping expressions for transformations.  The Knowledge Modules provide a key jump-start and eliminate a significant amount of Hadoop development.  Using Oracle Data Integrator together with Oracle Big Data Connectors, you can simplify the complexities of mapping, accessing, and loading big data (via NoSQL or HDFS) but also correlating your enterprise data – this correlation may require integrating across heterogeneous and standards-based environments, connecting to Oracle Exadata, or sourcing via a big data platform such as Oracle Big Data Appliance. To learn more about Oracle Data Integration and Big Data, download our resource kit to see the latest in whitepapers, webinars, downloads, and more… or go to our website on www.oracle.com/bigdata

    Read the article

  • General Policies and Procedures for Maintaining the Value of Data Assets

    Here is a general list for policies and procedures regarding maintaining the value of data assets. Data Backup Policies and Procedures Backups are very important when dealing with data because there is always the chance of losing data due to faulty hardware or a user activity. So the need for a strategic backup system should be mandatory for all companies. This being said, in the real world some companies that I have worked for do not really have a good data backup plan. Typically when companies tend to take this kind of approach in data backups usually the data is not really recoverable.  Unfortunately when companies do not regularly test their backup plans they get a false sense of security because they think that they are covered. However, I can tell you from personal and professional experience that a backup plan/system is never fully implemented until it is regularly tested prior to the time when it actually needs to be used. Disaster Recovery Plan Expanding on Backup Policies and Procedures, a company needs to also have a disaster recovery plan in order to protect its data in case of a catastrophic disaster.  Disaster recovery plans typically encompass how to restore all of a company’s data and infrastructure back to a restored operational status.  Most Disaster recovery plans also include time estimates on how long each step of the disaster recovery plan should take to be executed.  It is important to note that disaster recovery plans are never fully implemented until they have been tested just like backup plans. Disaster recovery plans should be tested regularly so that the business can be confident in not losing any or minimal data due to a catastrophic disaster. Firewall Policies and Content Filters One way companies can protect their data is by using a firewall to separate their internal network from the outside. Firewalls allow for enabling or disabling network access as data passes through it by applying various defined restrictions. Furthermore firewalls can also be used to prevent access from the internal network to the outside by these same factors. Common Firewall Restrictions Destination/Sender IP Address Destination/Sender Host Names Domain Names Network Ports Companies can also desire to restrict what their network user’s view on the internet through things like content filters. Content filters allow a company to track what webpages a person has accessed and can also restrict user’s access based on established rules set up in the content filter. This device and/or software can block access to domains or specific URLs based on a few factors. Common Content Filter Criteria Known malicious sites Specific Page Content Page Content Theme  Anti-Virus/Mal-ware Polices Fortunately, most companies utilize antivirus programs on all computers and servers for good reason, virus have been known to do the following: Corrupt/Invalidate Data, Destroy Data, and Steal Data. Anti-Virus applications are a great way to prevent any malicious application from being able to gain access to a company’s data.  However, anti-virus programs must be constantly updated because new viruses are always being created, and the anti-virus vendors need to distribute updates to their applications so that they can catch and remove them. Data Validation Policies and Procedures Data validation is very important to ensure that only accurate information is stored. The existence of invalid data can cause major problems when businesses attempt to use data for knowledge based decisions and for performance reporting. Data Scrubbing Policies and Procedures Data scrubbing is valuable to companies in one of two ways. The first can be used to clean data prior to being analyzed for report generation. The second is that it allows companies to remove things like personally Identifiable information from its data prior to transmit it between multiple environments or if the information is sent to an external location. An example of this can be seen with medical records in regards to HIPPA laws that prohibit the storage of specific personal and medical information. Additionally, I have professionally run in to a scenario where the Canadian government does not allow any Canadian’s personal information to be stored on a server not located in Canada. Encryption Practices The use of encryption is very valuable when a company needs to any personal information. This allows users with the appropriated access levels to view or confirm the existence or accuracy of data within a system by either decrypting the information or encrypting a piece of data and comparing it to the stored version.  Additionally, if for some unforeseen reason the data got in to the wrong hands then they would have to first decrypt the data before they could even be able to read it. Encryption just adds and additional layer of protection around data itself. Standard Normalization Practices The use of standard data normalization practices is very important when dealing with data because it can prevent allot of potential issues by eliminating the potential for unnecessary data duplication. Issues caused by data duplication include excess use of data storage, increased chance for invalidated data, and over use of data processing. Network and Database Security/Access Policies Every company has some form of network/data access policy even if they have none. These policies help secure data from being seen by inappropriate users along with preventing the data from being updated or deleted by users. In addition, without a good security policy there is a large potential for data to be corrupted by unassuming users or even stolen. Data Storage Policies Data storage polices are very important depending on how they are implemented especially when a company is trying to utilize them in conjunction with other policies like Data Backups. I have worked at companies where all network user folders are constantly backed up, and if a user wanted to ensure the existence of a piece of data in the form of a file then they had to store that file in their network folder. Conversely, I have also worked in places where when a user logs on or off of the network there entire user profile is backed up. Training Policies One of the biggest ways to prevent data loss and ensure that data will remain a company asset is through training. The practice of properly train employees on how to work with in systems that access data is crucial when trying to ensure a company’s data will remain an asset. Users need to be trained on how to manipulate a company’s data in order to perform their tasks to reduce the chances of invalidating data.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >