big data analytics - Page 41

Is A Web App Feasible For A Heavy Use Data Entry System?

- by Rob

Looking for opinions on this, we're working on a project that is essentially a data entry system for a production line. Heavy data input by users who normally work in Excel or other thick client data systems. We've been told (as a consequence) that we have to develop this as a thick client using .NET. Our argument was to develop as a web app, as it resolves a lot of issues and would be easier to write and maintain. Their argument against the web is that (supposedly) the web is not ready yet for a heavy duty data entry system, and that the web in a browser does not offer the speed, responsiveness, and fluid experience for the end-user that a thick client can (citing things such as drag and drop, rapid auto-entry and data navigation, etc.) Personally, I think that with good form design and JQuery/AJAX, a web app could do everything a thick client does just as well, and they just don't know what they're talking about. The irony is that a thick client has to go to a lot more effort to manage the deployment and connectivity back to the central data server than a web app would need to do, so in terms of speed I would expect a web app to be faster. What are the thoughts of those out there? Are there any technologies currently in production use that modern data entry systems are being developed as web apps in? Appreciate any feedback. Regards, Rob.

Read the article

Pulling Data out of an object in Javascript

- by PerryCS

I am having a problem retreiving data out of an object passed back from PHP. I've tried many different ways to access this data and none work. In Firebug I see the following... (it looks nicer in Firebug) - I tried to make this look as close to Firebug as possible results Object { data="{"formName":"form3","formData":"data goes here"}", phpLiveDebug="<...s: 198.91.215.227"} data "{"formName":"form3","formData":"data goes here"}" phpLiveDebug "<...s: 198.91.215.227" I can access phpLiveDebug no problem, but the data portion is an object. I have tried the following... success: function(results) { //$("#formName").val(results.data.formName); //$("#formName").val(results.data[0].formName); //$("#formName").val(results.data[0]); //$("#formName").val(results.data[1]); //$("#formName").val(results.data[0]["formName"]); var tmp = results.data[formName]; alert("!" + tmp + "!"); $("#formName").val(tmp); $("#jqueryPHPDebug").val(results.phpLiveDebug); } This line works in the example above... $("#jqueryPHPDebug").val(results.phpLiveDebug); but... I can't figure out how to get at the data inside the results.data portion... as you can see above, I have been trying different things and more not even listed there. I was really hoping this line would work :) var tmp = results.data[formName]; But it doesn't. So, after many days of reading, tinkering, my solution was to re-write it to return data similar to the phpLiveDebug but then I thought... it's gotta be something simple I'm overlooking... Thank you for your time. Please try and explain why my logic (my horrible attempts at trying to figure out the proper method) above is wrong if you can?

Read the article

MaxTotalSizeInBytes - Blind spots in Usage file and Web Analytics Reports

- by Gino Abraham

Originally posted on: http://geekswithblogs.net/GinoAbraham/archive/2013/10/28/maxtotalsizeinbytes---blind-spots-in-usage-file-and-web-analytics.aspx http://blogs.msdn.com/b/sharepoint_strategery/archive/2012/04/16/usage-file-and-web-analytics-reports-with-blind-spots.aspx In my previous post (Troubleshooting SharePoint 2010 Web Analytics), I referenced a problem that can occur when exceeding the daily partition size for the LoggingDB, which generates the ULS message “[Partition] has exceeded the max bytes”. Below, I wanted to provide some additional info on this particular issue and help identify some options if this occurs. As an aside, this post only applies if you are missing portions of Usage data - think blind spots on intermittent days or user activity regularly sparse for the afternoon/evening. If this fits your scenario - read on. But if Usage logs are outright missing, go check out my Troubleshooting post first. Background on the problem:The LoggingDB database has a default maximum size of ~6GB. However, SharePoint evenly splits this total size into fixed sized logical partitions – and the number of partitions is defined by the number of days to retain Usage data (by default 14 days). In this case, 14 partitions would be created to account for the 14 days of retention. If the retention were halved to 7 days, the LoggingDBwould be split into 7 corresponding partitions at twice the size. In other words, the partition size is generally defined as [max size for DB] / [number of retention days].Going back to the default scenario, the “max size” for the LoggingDB is 6200000000 bytes (~6GB) and the retention period is 14 days. Using our formula, this would be [~6GB] / [14 days], which equates to 444858368 bytes (~425MB) per partition per day. Again, if the retention were halved to 7 days (which halves the number of partitions), the resulting partition size becomes [~6GB] / [7 days], or ~850MB per partition.From my experience, when the partition size for any given day is exceeded, the usage logging for the remainder of the day is essentially thrown away because SharePoint won’t allow any more to be written to that day’s partition. The only clue that this is occurring (beyond truncated usage data) is an error such as the following that gets reported in the ULS:04/08/2012 09:30:04.78 OWSTIMER.EXE (0x1E24) 0x2C98 SharePoint Foundation Health i0m6 High Table RequestUsage_Partition12 has 444858368 bytes that has exceeded the max bytes 444858368It’s also worth noting that the exact bytes reported (e.g. ‘444858368’ above) may slightly vary among farms. For example, you may instead see 445226812, 439123456, or something else in the ballpark. The exact number itself doesn't matter, but this error message intends to indicates that the reporting usage has exceeded the partition size for the given day.What it means:The error itself is easy to miss, which can lead to substantial gaps in the reporting data (your mileage may vary) if not identified. At this point, I can only advise to periodically check the ULS logs for this message. Down the road, I plan to explore if [Developing a Custom Health Rule] could be leveraged to identify the issue (If you've ever built Custom Health Rules, I'd be interested to hear about your experiences). Overcoming this issue also poses a challenge, with workaround options including:Lower the retentionBecause the partition size is generally defined as [max size] / [number of retention days], the first option is to lower the number of days to retain the data – the lower the retention, the lower the divisor and thus a bigger partition. For example, halving the retention from 14 to 7 days would halve the number of partitions, but double the partition size to ~850MB (e.g. [6200000000 bytes] / [7 days] = ~850GB partitions). Lowering it to 2 days would result in two ~3GB partitions… and so on.Recreate the LoggingDB with an increased sizeThe property MaxTotalSizeInBytes is exposed by OM code for the SPUsageDefinition object and can be updated with the example PowerShell snippet below. However, updating this value has no immediate impact because this size only applies when creating a LoggingDB. Therefore, you must create a newLoggingDB for the Usage Service Application. The gotcha: this effectively deletes all prior Usage databecause the Usage Service Application can only have a single LoggingDB.Here is an example snippet to update the "Page Requests" Usage Definition:$def=Get-SPUsageDefinition -Identity "page requests" $def.MaxTotalSizeInBytes=12400000000 $def.update()Create a new Logging database and attach to the Usage Service Application using the following command: Get-spusageapplication | Set-SPUsageApplication -DatabaseServer <dbServer> -DatabaseName <newDBname> Updated (5/10/2012): Once the new database has been created, you can confirm the setting has truly taken by running the following SQL Query (be sure to replace the database name in the following query with the name provided in the PowerShell above)SELECT * FROM [WSS_UsageApplication].[dbo].[Configuration] WITH (nolock) WHERE ConfigName LIKE 'Max Total Bytes - RequestUsage'

Read the article

RPi and Java Embedded GPIO: Big Data and Java Technology

- by hinkmond

Java Embedded and Big Data go hand-in-hand, especially as demonstrated by prototyping on a Raspberry Pi to show how well the Java Embedded platform can perform on a small embedded device which then becomes the proof-of-concept for industrial controllers, medical equipment, networking gear or any type of sensor-connected device generating large amounts of data. The key is a fast and reliable way to access that data using Java technology. In the previous blog posts you've seen the integration of a static electricity sensor and the Raspberry Pi through the GPIO port, then accessing that data through Java Embedded code. It's important to point out how this works and why it works well with Java code. First, the version of Linux (Debian Wheezy/Raspian) that is found on the RPi has a very convenient way to access the GPIO ports through the use of Linux OS managed file handles. This is key in avoiding terrible and complex coding using register manipulation in C code, or having to program in a less elegant and clumsy procedural scripting language such as python. Instead, using Java Embedded, allows a fast way to access those GPIO ports through those same Linux file handles. Java already has a very easy to program way to access file handles with a high degree of performance that matches direct access of those file handles with the Linux OS. Using the Java API java.io.FileWriter lets us open the same file handles that the Linux OS has for accessing the GPIO ports. Then, by first resetting the ports using the unexport and export file handles, we can initialize them for easy use in a Java app. // Open file handles to GPIO port unexport and export controls FileWriter unexportFile = new FileWriter("/sys/class/gpio/unexport"); FileWriter exportFile = new FileWriter("/sys/class/gpio/export"); ... // Reset the port unexportFile.write(gpioChannel); unexportFile.flush(); // Set the port for use exportFile.write(gpioChannel); exportFile.flush(); Then, another set of file handles can be used by the Java app to control the direction of the GPIO port by writing either "in" or "out" to the direction file handle. // Open file handle to input/output direction control of port FileWriter directionFile = new FileWriter("/sys/class/gpio/gpio" + gpioChannel + "/direction"); // Set port for input directionFile.write("in"); // Or, use "out" for output directionFile.flush(); And, finally, a RandomAccessFile handle can be used with a high degree of performance on par with native C code (only milliseconds to read in data and write out data) with low overhead (unlike python) to manipulate the data going in and out on the GPIO port, while the object-oriented nature of Java programming allows for an easy way to construct complex analytic software around that data access functionality to the external world. RandomAccessFile[] raf = new RandomAccessFile[GpioChannels.length]; ... // Reset file seek pointer to read latest value of GPIO port raf[channum].seek(0); raf[channum].read(inBytes); inLine = new String(inBytes); It's Big Data from sensors and industrial/medical/networking equipment meeting complex analytical software on a small constraint device (like a Linux/ARM RPi) where Java Embedded allows you to shine as an Embedded Device Software Designer. Hinkmond

Read the article

Using Content Analytics for More Effective Engagement

- by Kellsey Ruppel

Using Content Analytics for More Effective Engagement: Turning High-Volume Content into Templates for Success By Mitchell Palski, Oracle WebCenter Sales Consultant Many organizations use Oracle WebCenter Portal to develop these basic types of portals: Intranet portals used for collaboration, employee self-service, and company communication Extranet portals used by customers and partners for self-service and support Team collaboration portals that allow users to share documents and content, track activity, and engage in discussions Portals are intended to provide a personalized, single point of interaction with web-based applications and information. The user experiences that a Portal is capable of displaying should be relevant to an individual user or class of users (a group or role). The components of a Portal that would vary based on a user’s identity include: Web content such as images, news articles, and on-screen instruction Social tools such as threaded discussions, polls/surveys, and blogs Document management tools to upload, download, and edit files Web applications that present data visualizations and data entry modules These collections of content, tools, and applications make up valuable workspaces. The challenge that a development team may have is defining which combinations are the most effective for its users. No one wants to create and manage a workspace that goes un-used or (even worse) that is used but is ineffective. Oracle WebCenter Portal provides you with the capabilities to not only rapidly develop variations of portals, but also identify which portals are the most effective and should be re-used throughout an enterprise. Capturing Portal AnalyticsOracle WebCenter Portal provides an analytics service that allows administrators and business users to track and analyze portal usage. These analytics are captured in the form of: Usage tracking metrics Behavior tracking User Profile Correlation The out-of-the-box task reports that come with Oracle WebCenter Portal include: WebCenter Portal Traffic Page Traffic Login Metrics Portlet Traffic Portlet Response Time Portlet Instance Traffic Portlet Instance Response Time Search Metrics Document Metrics Wiki Metrics Blog Metrics Discussion Metrics Portal Traffic Portal Response Time By determining the usage and behavior tracking metrics that are associated with specific user profiles (including groups and roles), your administrators will be able to identify the components of your solution that are the most valuable. Your first step as an administrator should be to identify the specific pages and/or components are used the most frequently. Next, determine the user(s) or user-group(s) that are accessing those high-use elements of a portal. It is also important to determine patterns in high-usage and see if they correlate to a specific schedule. One of the goals of any development team (especially those that are following Agile methodologies) should be to develop reusable web components to minimize redundant development. Oracle WebCenter Portal provides you the tools to capture the successful workspaces that have already been developed and identified so that they can be reused for similar user demographics. Re-using Successful PortalsWhen creating a new Portal in Oracle WebCenter, developers have the option to base that portal on a template that includes: Pre-seeded data such as pages, tools, user roles, and look-and-feel assets Specific sub-sets of page-layouts, tools, and other resources to standardize what is added to a Portal’s pages Any custom components that your team creates during development cycles Once you have identified a successful workspace and its most valuable components, leverage Oracle WebCenter’s ability to turn that custom portal into a portal template. By creating a template from your already successful portal, you are empowering your enterprise by providing a starting point for future initiatives. Your new projects, new teams, and new web pages can benefit from lessons learned and adjustments that have already been made to optimize user experiences instead of starting from scratch. ***For a complete explanation of how to work with Portal Templates, be sure to read the Fusion Middleware documentation available online.

Read the article

Big O complexity

- by Codenotguru

what is Big Oh notation mean and when an algorithm has logarithmic complexity?? for example how is quick sort O(nlogn)???

Read the article

Importing Data from Google Analytics

- by Adam Tannon

I am planning on building a web app with many different public-facing HTTP servers; each of which will have Google Analytics (GA) installed on them. I'd like to create a "dashboard" app that consolidates the GA data into one screen. I've been perusing the documentation for this so-called GA API, but I can't tell what the end result of the GA API is: Does the GA API allow me to do exactly what I am looking for it to do? Or... Does the GA API do something entirely different (like allow me to share my data with Google+ or something else weird) Since an API can be used to CRUD any kind of data, I guess I'm asking which way the GA API goes: is it for querying (reading) data from 1+ server instances, or is it for modifying data on those servers or somewhere else? Thanks in advance!

Read the article

Standard ratio of cookies to "visitors"?

- by Jeff Atwood

As noted in a recent blog post, We see a large discrepancy between Google Analytics "visitors" and Quantcast "visitors". Also, for reasons we have never figured out, Google Analytics just gets larger numbers than Quantcast. Right now GA is showing more visitors (15 million) on stackoverflow.com alone than Quantcast sees on the whole network (14 million): Why? I don’t know. Either Google Analytics loses cookies sometimes, or Quantcast misses visitors. Counting is an inexact science. We think this is because Quantcast uses a more conservative ratio of cookies-to-visitors. Whereas Google Analytics might consider every cookie a "visitor", Quantcast will only consider every 1.24 cookies a "visitor". This makes sense to me, as people may access our sites from multiple computers, multiple browsers, etcetera. I have two closely related questions: Is there an accepted standard ratio of cookies to visitors? This is obviously an inexact science, but is there any emerging rule of thumb? Is there any more accurate way to count "visitors" to a website other than relying on browser cookies? Or is this just always going to be kind of a best-effort estimation crapshoot no matter how you measure it?

Read the article

Building dynamic OLAP data marts on-the-fly

- by DrJohn

At the forthcoming SQLBits conference, I will be presenting a session on how to dynamically build an OLAP data mart on-the-fly. This blog entry is intended to clarify exactly what I mean by an OLAP data mart, why you may need to build them on-the-fly and finally outline the steps needed to build them dynamically. In subsequent blog entries, I will present exactly how to implement some of the techniques involved. What is an OLAP data mart? In data warehousing parlance, a data mart is a subset of the overall corporate data provided to business users to meet specific business needs. Of course, the term does not specify the technology involved, so I coined the term "OLAP data mart" to identify a subset of data which is delivered in the form of an OLAP cube which may be accompanied by the relational database upon which it was built. To clarify, the relational database is specifically create and loaded with the subset of data and then the OLAP cube is built and processed to make the data available to the end-users via standard OLAP client tools. Why build OLAP data marts? Market research companies sell data to their clients to make money. To gain competitive advantage, market research providers like to "add value" to their data by providing systems that enhance analytics, thereby allowing clients to make best use of the data. As such, OLAP cubes have become a standard way of delivering added value to clients. They can be built on-the-fly to hold specific data sets and meet particular needs and then hosted on a secure intranet site for remote access, or shipped to clients' own infrastructure for hosting. Even better, they support a wide range of different tools for analytical purposes, including the ever popular Microsoft Excel. Extension Attributes: The Challenge One of the key challenges in building multiple OLAP data marts based on the same 'template' is handling extension attributes. These are attributes that meet the client's specific reporting needs, but do not form part of the standard template. Now clearly, these extension attributes have to come into the system via additional files and ultimately be added to relational tables so they can end up in the OLAP cube. However, processing these files and filling dynamically altered tables with SSIS is a challenge as SSIS packages tend to break as soon as the database schema changes. There are two approaches to this: (1) dynamically build an SSIS package in memory to match the new database schema using C#, or (2) have the extension attributes provided as name/value pairs so the file's schema does not change and can easily be loaded using SSIS. The problem with the first approach is the complexity of writing an awful lot of complex C# code. The problem of the second approach is that name/value pairs are useless to an OLAP cube; so they have to be pivoted back into a proper relational table somewhere in the data load process WITHOUT breaking SSIS. How this can be done will be part of future blog entry. What is involved in building an OLAP data mart? There are a great many steps involved in building OLAP data marts on-the-fly. The key point is that all the steps must be automated to allow for the production of multiple OLAP data marts per day (i.e. many thousands, each with its own specific data set and attributes). Now most of these steps have a great deal in common with standard data warehouse practices. The key difference is that the databases are all built to order. The only permanent database is the metadata database (shown in orange) which holds all the metadata needed to build everything else (i.e. client orders, configuration information, connection strings, client specific requirements and attributes etc.). The staging database (shown in red) has a short life: it is built, populated and then ripped down as soon as the OLAP Data Mart has been populated. In the diagram below, the OLAP data mart comprises the two blue components: the Data Mart which is a relational database and the OLAP Cube which is an OLAP database implemented using Microsoft Analysis Services (SSAS). The client may receive just the OLAP cube or both components together depending on their reporting requirements. So, in broad terms the steps required to fulfil a client order are as follows: Step 1: Prepare metadata Create a set of database names unique to the client's order Modify all package connection strings to be used by SSIS to point to new databases and file locations. Step 2: Create relational databases Create the staging and data mart relational databases using dynamic SQL and set the database recovery mode to SIMPLE as we do not need the overhead of logging anything Execute SQL scripts to build all database objects (tables, views, functions and stored procedures) in the two databases Step 3: Load staging database Use SSIS to load all data files into the staging database in a parallel operation Load extension files containing name/value pairs. These will provide client-specific attributes in the OLAP cube. Step 4: Load data mart relational database Load the data from staging into the data mart relational database, again in parallel where possible Allocate surrogate keys and use SSIS to perform surrogate key lookup during the load of fact tables Step 5: Load extension tables & attributes Pivot the extension attributes from their native name/value pairs into proper relational tables Add the extension attributes to the views used by OLAP cube Step 6: Deploy & Process OLAP cube Deploy the OLAP database directly to the server using a C# script task in SSIS Modify the connection string used by the OLAP cube to point to the data mart relational database Modify the cube structure to add the extension attributes to both the data source view and the relevant dimensions Remove any standard attributes that not required Process the OLAP cube Step 7: Backup and drop databases Drop staging database as it is no longer required Backup data mart relational and OLAP database and ship these to the client's infrastructure Drop data mart relational and OLAP database from the build server Mark order complete Start processing the next order, ad infinitum. So my future blog posts and my forthcoming session at the SQLBits conference will all focus on some of the more interesting aspects of building OLAP data marts on-the-fly such as handling the load of extension attributes and how to dynamically alter the structure of an OLAP cube using C#.

Read the article

In google analytics, what methods can I use to track visits from one specific computer?

- by calumbrodie

If I have access to local machines where I can change the http user agent, add custom cookies for each machine etc. Is there a way I can configure google analytics to recognise and filter traffic from each of these sources?

Read the article

NRF Big Show 2011 -- Part 3

- by David Dorf

I'm back from the NRF show having been one of the lucky people who's flight was not canceled. The show was very crowded with a reported 20% increase in attendance and everyone seemed in high spirits. After two years of sluggish retail sales, things are really picking up and it was reflected in everyone's mood. The pop-up Disney Store in the Oracle booth was great and attracted lots of interest in their mobile POS. I know many attendees visited the Disney Store in Times Square to see the entire operation. It's an impressive two-story store that keeps kids engaged. The POS demonstration station, where most of our innovations were demoed, was always crowded. Unfortunately most of the demos used WiFi and the signals from other booths prevented anything from working reliably. Nevertheless, the demo team did an excellent job walking people through the scenarios and explaining how shopping is being impacted by mobile, analytics, and RFID. Big Show Links Disney uncovers its store magic Top 10 Things You Missed at the NRF Big Show 2011 Oracle Retail Stores Innovation Station at NRF Big Show 2011 (video) The buzz of the show was again around mobile solutions. Several companies are creating mobile POS using the iPod Touch, including integrations to Oracle POS for the following retailers: Disney Stores with InfoGain Victoria's Secret with InfoGain Urban Outfitters with Starmount The Gap with Global Bay Keeping with the mobile theme, the NRF release a revised version of their Mobile Blueprint at NRF. It will be posted to the NRF site very soon. The alternate payments section had a major rewrite that provides a great overview and proximity and remote payment technologies. NRF Mobile Blueprint Links New mobile blueprint provides fresh insights NRF Mobile Blueprint 2011 (slides) I hope to do some posts on some of the interesting companies I spoke with in the coming weeks.

Read the article

The Importance of Collaboration, Analytics, and Mobile Technologies for Modern HR

- by HCM-Oracle

It was 17 years ago, when a McKinsey study uncovered the “war for talent”. Today, it is no point of contention that a strong talent-centric strategy maybe the most important focus for organizations. A talent-centric organization aims at recruiting, retaining and developing the best talent. The best employees will be able to adapt responsibilities and be able to come up with solutions to solve problems, which are important skills in today’s dynamic work environment, and arguably more important in this recessionary climate. The notion of hiring and retaining talented employees for organizational sustainability and competitive advantage is not a new concept. But can organizations consider themselves as having a “talent-centric” strategy without up-to-date collaboration tools, HR analytics and mobile technologies in pursuit of attracting, hiring and retaining the best talent? Attend the Upcoming Webcast A webcast on June 19th at 3pm EST will reveal more results of the study. Based on original research done in collaboration between Oracle HCM and HCI, we unveil new findings that explore how critical collaboration, analytic insights and mobile technology are for supporting a talent-centric work environment. You will learn: What are the benefits to being talent-centric? How does collaboration via social networks, analytics with predictive insights and mobile technologies support the talent-centric strategy of an organization? What is the state of play for these technologies? Register Here

Read the article

Business Analytics Monthly Index - October 2013

- by p.anda

Starting from this post we are providing a monthly summary. This provides a quick look at what has been happening in our Proactive Support Blog over the last month. Welcome to the first Monthly Index posting! Please let us know what you think and your suggestions are most welcome ... Oracle Business Analytics - Blog Monthly Index - October 2013 General Summary Link Introducing the Business Analytics Proactive Support Team - Outlining the Proactive Support Team function View Business Intelligence (BI) Summary Link OBIEE version 11.1.1.7.131017 has been released - Links to the latest OBIEE release information & downloads View Update to OBIEE Chrome 30 issue - Information for patch release for OBIEE Chrome issue View OBIEE problems with Chrome (update 30) - Highlight OBIEE 11.1.1.7.1 issue with latest Google Chrome update 30 View OBIEE 11.1.1.7.1 Sample App (V309 R2) released - Link and Information about the current OBIEE Sample App View OBIEE - APEX integration - An article discussing the OBIEE APEX Integration View Enterprise Performance Management (EPM) Summary Link Hyperion Smartview Assistance - Information & resources for Hyperion Smartview inc. OBIEE integration View Java update alert: issue with EAS 11.1.2.3 - Advisory of recent Java release and identified EAS problem + workaround View EPM troubleshooting Utilities - Outlining additional resources for troubleshooting EPM View EPM Infrastructure Tuning Guide released - Link to the EPM Infrastructure Tuning Guide (v.11.1.2.2 / 11.1.2.3) View Essbase - FormatString - Discussing Essbase "Typed Measures" View October EPM patch set updates released - Links to the October Patches for EPM View featuring - the DRM blog - Featuring one of our co-blogs that is very beneficial View Advisor Webcast Summary Link Advisor Webcast: EPM 11.1.2.3 new features in Financial Applications - Announcement for AW: New Features in FA (recording post presentation via Doc ID 1456233.1 | Archived 2013) View Advisor Webcast: Troubleshooting Discoverer editions - AW: Discussing Discover Logs/Tracing/EUL Status Workbooks & more. (recording post presentation via Doc ID 1456233.1 | Archived 2013) View

Read the article

Data that has been deleted in P6, how is it updated in Analytics

- by Jeffrey McDaniel

In P6 Reporting Database 2.0 the ETL process looked to the refrdel table in the P6 PMDB to determine which projects were deleted. The refrdel table could not be cleared out between ETL runs or those deletes would be lost. After the ETL process is run the refrdel can be cleared out. It is important to keep any purging of the refrdel in a consistent cycle so the ETL process can pick up these deletes and process them accordingly. In P6 Reporting Database 2.2 and higher the Extended Schema is used as the data source. In the Extended Schema, deleted data is filtered out by the views. The Extended Schema services will handle any interaction with the refrdel table, this concern with timing refrdel cleanup and ETL runs is not applicable as of this release. In the Extended Schema tables (ex. TaskX) there can still be deleted data present. The Extended Schema views join on the primary PMDB tables (ex. Task) and filter out any deleted data. Any data that was deleted that remains in the Extended Schema tables can be cleaned out at a designated time by running the clean up procedure as documented in the P6 Extended Schema white paper. This can be run occasionally but is not necessary to run often unless large amounts of data has been deleted.

Read the article

The Best Data Integration for Exadata Comes from Oracle

- by maria costanzo

Oracle Data Integrator and Oracle GoldenGate offer unique and optimized data integration solutions for Oracle Exadata. For example, customers that choose to feed their data warehouse or reporting database with near real-time throughout the day, can do so without decreasing performance or availability of source and target systems. And if you ask why real-time, the short answer is: in today’s fast-paced, always-on world, business decisions need to use more relevant, timely data to be able to act fast and seize opportunities. A longer response to "why real-time" question can be found in a related blog post. If we look at the solution architecture, as shown on the diagram below, Oracle Data Integrator and Oracle GoldenGate are both uniquely designed to take full advantage of the power of the database and to eliminate unnecessary middle-tier components. Oracle Data Integrator (ODI) is the best bulk data loading solution for Exadata. ODI is the only ETL platform that can leverage the full power of Exadata, integrate directly on the Exadata machine without any additional hardware, and by far provides the simplest setup and fastest overall performance on an Exadata system. We regularly see customers achieving a 5-10 times boost when they move their ETL to ODI on Exadata. For some companies the performance gain is even much higher. For example a large insurance company did a proof of concept comparing ODI vs a traditional ETL tool (one of the market leaders) on Exadata. The same process that was taking 5hrs and 11 minutes to complete using the competing ETL product took 7 minutes and 20 seconds with ODI. Oracle Data Integrator was 42 times faster than the conventional ETL when running on Exadata.This shows that Oracle's own data integration offering helps you to gain the most out of your Exadata investment with a truly optimized solution. GoldenGate is the best solution for streaming data from heterogeneous sources into Exadata in real time. Oracle GoldenGate can also be used together with Data Integrator for hybrid use cases that also demand non-invasive capture, high-speed real time replication. Oracle GoldenGate enables real-time data feeds from heterogeneous sources non-invasively, and delivers to the staging area on the target Exadata system. ODI runs directly on Exadata to use the database engine power to perform in-database transformations. Enterprise Data Quality is integrated with Oracle Data integrator and enables ODI to load trusted data into the data warehouse tables. Only Oracle can offer all these technical benefits wrapped into a single intelligence data warehouse solution that runs on Exadata. Compared to traditional ETL with add-on CDC this solution offers: § Non-invasive data capture from heterogeneous sources and avoids any performance impact on source § No mid-tier; set based transformations use database power § Mini-batches throughout the day –or- bulk processing nightly which means maximum availability for the DW § Integrated solution with Enterprise Data Quality enables leveraging trusted data in the data warehouse In addition to Starwood Hotels and Resorts, Morrison Supermarkets, United Kingdom’s fourth-largest food retailer, has seen the power of this solution for their new BI platform and shared their story with us. Morrisons needed to analyze data across a large number of manufacturing, warehousing, retail, and financial applications with the goal to achieve single view into operations for improved customer service. The retailer deployed Oracle GoldenGate and Oracle Data Integrator to bring new data into Oracle Exadata in near real-time and replicate the data into reporting structures within the data warehouse—extending visibility into operations. Using Oracle's data integration offering for Exadata, Morrisons produced financial reports in seconds, rather than minutes, and improved staff productivity and agility. You can read more about Morrison’s success story here and hear from Starwood here. From an Irem Radzik article.

Read the article

How can I use domain masking without having self referral in Google Analytics

- by Cdore

I have one old domain that points to a website's server's ip (let's call it www.oldsite.com). I have a new one, www.newsite.com, that is set up to be forwarded to a specific page on the website. Due to the way the host of newsite.com places the website in a frame, in Google Analystics, the newsite.com is listed as a source rather than the source they were at before hand, causing a self referral. A solution is to edit the code of the iframe as I looked up, but there's no way to really edit the host's masking source code of course. Another solution I did previously was have www.newsite.com point to the address that www.oldsite.come pointed to. It solved the analytics problems, but in exchange, the url masking no longer worked. In the address bar, it came up as www.oldsite.com. Is there a way to make me have url masking and be able to forward to agree with google analytics? The server of the website is hosted on a cloud server, if this is anymore information.

Read the article

Data access pattern

- by andlju

I need some advice on what kind of pattern(s) I should use for pushing/pulling data into my application. I'm writing a rule-engine that needs to hold quite a large amount of data in-memory in order to be efficient enough. I have some rather conflicting requirements; It is not acceptable for the engine to always have to wait for a full pre-load of all data before it is functional. Only fetching and caching data on-demand will lead to the engine taking too long before it is running quickly enough. An external event can trigger the need for specific parts of the data to be reloaded. Basically, I think I need a combination of pushing and pulling data into the application. A simplified version of my current "pattern" looks like this (in psuedo-C# written in notepad): // This interface is implemented by all classes that needs the data interface IDataSubscriber { void RegisterData(Entity data); } // This interface is implemented by the data access class interface IDataProvider { void EnsureLoaded(Key dataKey); void RegisterSubscriber(IDataSubscriber subscriber); } class MyClassThatNeedsData : IDataSubscriber { IDataProvider _provider; MyClassThatNeedsData(IDataProvider provider) { _provider = provider; _provider.RegisterSubscriber(this); } public void RegisterData(Entity data) { // Save data for later StoreDataInCache(data); } void UseData(Key key) { // Make sure that the data has been stored in cache _provider.EnsureLoaded(key); Entity data = GetDataFromCache(key); } } class MyDataProvider : IDataProvider { List<IDataSubscriber> _subscribers; // Make sure that the data for key has been loaded to all subscribers public void EnsureLoaded(Key key) { if (HasKeyBeenMarkedAsLoaded(key)) return; PublishDataToSubscribers(key); MarkKeyAsLoaded(key); } // Force all subscribers to get a new version of the data for key public void ForceReload(Key key) { PublishDataToSubscribers(key); MarkKeyAsLoaded(key); } void PublishDataToSubscribers(Key key) { Entity data = FetchDataFromStore(key); foreach(var subscriber in _subscribers) { subscriber.RegisterData(data); } } } // This class will be spun off on startup and should make sure that all data is // preloaded as quickly as possible class MyPreloadingThread { IDataProvider _provider; MyPreloadingThread(IDataProvider provider) { _provider = provider; } void RunInBackground() { IEnumerable<Key> allKeys = GetAllKeys(); foreach(var key in allKeys) { _provider.EnsureLoaded(key); } } } I have a feeling though that this is not necessarily the best way of doing this.. Just the fact that explaining it seems to take two pages feels like an indication.. Any ideas? Any patterns out there I should have a look at?

Read the article

Data access pattern, combining push and pull?

- by andlju

I need some advice on what kind of pattern(s) I should use for pushing/pulling data into my application. I'm writing a rule-engine that needs to hold quite a large amount of data in-memory in order to be efficient enough. I have some rather conflicting requirements; It is not acceptable for the engine to always have to wait for a full pre-load of all data before it is functional. Only fetching and caching data on-demand will lead to the engine taking too long before it is running quickly enough. An external event can trigger the need for specific parts of the data to be reloaded. Basically, I think I need a combination of pushing and pulling data into the application. A simplified version of my current "pattern" looks like this (in psuedo-C# written in notepad): // This interface is implemented by all classes that needs the data interface IDataSubscriber { void RegisterData(Entity data); } // This interface is implemented by the data access class interface IDataProvider { void EnsureLoaded(Key dataKey); void RegisterSubscriber(IDataSubscriber subscriber); } class MyClassThatNeedsData : IDataSubscriber { IDataProvider _provider; MyClassThatNeedsData(IDataProvider provider) { _provider = provider; _provider.RegisterSubscriber(this); } public void RegisterData(Entity data) { // Save data for later StoreDataInCache(data); } void UseData(Key key) { // Make sure that the data has been stored in cache _provider.EnsureLoaded(key); Entity data = GetDataFromCache(key); } } class MyDataProvider : IDataProvider { List<IDataSubscriber> _subscribers; // Make sure that the data for key has been loaded to all subscribers public void EnsureLoaded(Key key) { if (HasKeyBeenMarkedAsLoaded(key)) return; PublishDataToSubscribers(key); MarkKeyAsLoaded(key); } // Force all subscribers to get a new version of the data for key public void ForceReload(Key key) { PublishDataToSubscribers(key); MarkKeyAsLoaded(key); } void PublishDataToSubscribers(Key key) { Entity data = FetchDataFromStore(key); foreach(var subscriber in _subscribers) { subscriber.RegisterData(data); } } } // This class will be spun off on startup and should make sure that all data is // preloaded as quickly as possible class MyPreloadingThread { IDataProvider _provider; MyPreloadingThread(IDataProvider provider) { _provider = provider; } void RunInBackground() { IEnumerable<Key> allKeys = GetAllKeys(); foreach(var key in allKeys) { _provider.EnsureLoaded(key); } } } I have a feeling though that this is not necessarily the best way of doing this.. Just the fact that explaining it seems to take two pages feels like an indication.. Any ideas? Any patterns out there I should have a look at?

Read the article

Import csv data (SDK iphone)

- by Ni

I am new to cocoa. I have been working on these stuff for a few days. For the following code, i can read all the data in the string, and successfully get the data for plot. NSMutableArray *contentArray = [NSMutableArray array]; NSString *filePath = @"995,995,995,995,995,995,995,995,1000,997,995,994,992,993,992,989,988,987,990,993,989"; NSArray *myText = [filePath componentsSeparatedByString:@","]; NSInteger idx; for (idx = 0; idx < myText.count; idx++) { NSString *data =[myText objectAtIndex:idx]; NSLog(@"%@", data); id x = [NSNumber numberWithFloat:0+idx*0.002777778]; id y = [NSDecimalNumber decimalNumberWithString:data]; [contentArray addObject: [NSMutableDictionary dictionaryWithObjectsAndKeys:x, @"x", y, @"y", nil]]; } self.dataForPlot = contentArray; then, i try to load the data from csv file. the data in Data.csv file has the same value and the same format as 995,995,995,995,995,995,995,995,1000,997,995,994,992,993,992,989,988,987,990,993,989. I run the code, it is supposed to give the same graph output. however, it seems that the data is not loaded from csv file successfully. i can not figure out what's wrong with my code. NSMutableArray *contentArray = [NSMutableArray array]; NSString *filePath = [[NSBundle mainBundle] pathForResource:@"Data" ofType:@"csv"]; NSString *Data = [NSString stringWithContentsOfFile:filePath encoding:NSUTF8StringEncoding error:nil ]; if (Data) { NSArray *myText = [Data componentsSeparatedByString:@","]; NSInteger idx; for (idx = 0; idx < myText.count; idx++) { NSString *data =[myText objectAtIndex:idx]; NSLog(@"%@", data); id x = [NSNumber numberWithFloat:0+idx*0.002777778]; id y = [NSDecimalNumber decimalNumberWithString:data]; [contentArray addObject: [NSMutableDictionary dictionaryWithObjectsAndKeys:x, @"x", y, @"y",nil]]; } self.dataForPlot = contentArray; } The only difference is NSString *filePath = [[NSBundle mainBundle] pathForResource:@"Data" ofType:@"csv"]; NSString *Data = [NSString stringWithContentsOfFile:filePath encoding:NSUTF8StringEncoding error:nil ]; if (data){ } did i do anything wrong here?? Thanks for your help!!!!

Read the article

Analyze big human database

- by Neir0

Lets we have a big people database. Each human has a many parameters: age, weight, favorite music, favorite films, education etc. I want to know how one feature associate with other features. For example, if human has a good education what it means for musical preferences? Or how films preferences changes with age? I know about assotian rules algorithms like apriory but i donnt want just to found assotiation rules, i want to know how one specific feature affect to others. Which keywords i must to use for google?

Read the article

POST data not being received

- by Alexander

I've got an iPhone App that is supposed to send POST data to my server to register the device in a MySQL database so we can send notifications etc... to it. It sends it's unique identifier, device name, token, and a few other small things like passwords and usernames as a POST request to our server. The problem is that sometimes the server doesn't receive the data. And by this I mean, its not just receiving blank values for the POST inputs but, its not receiving ANY post data at all. I am logging all POST inputs to my server into some log files and when the script that relies on the POST data from the device fails (detects no data) I notice that its because NO POST data was sent. Is this a problem on the server, like refusing data or something or does this have to be on the client's side? What could be causing this?

Read the article

Building Simple Workflows in Oozie

- by dan.mcclary

Introduction More often than not, data doesn't come packaged exactly as we'd like it for analysis. Transformation, match-merge operations, and a host of data munging tasks are usually needed before we can extract insights from our Big Data sources. Few people find data munging exciting, but it has to be done. Once we've suffered that boredom, we should take steps to automate the process. We want codify our work into repeatable units and create workflows which we can leverage over and over again without having to write new code. In this article, we'll look at how to use Oozie to create a workflow for the parallel machine learning task I described on Cloudera's site. Hive Actions: Prepping for Pig In my parallel machine learning article, I use data from the National Climatic Data Center to build weather models on a state-by-state basis. NCDC makes the data freely available as gzipped files of day-over-day observations stretching from the 1930s to today. In reading that post, one might get the impression that the data came in a handy, ready-to-model files with convenient delimiters. The truth of it is that I need to perform some parsing and projection on the dataset before it can be modeled. If I get more observations, I'll want to retrain and test those models, which will require more parsing and projection. This is a good opportunity to start building up a workflow with Oozie. I store the data from the NCDC in HDFS and create an external Hive table partitioned by year. This gives me flexibility of Hive's query language when I want it, but let's me put the dataset in a directory of my choosing in case I want to treat the same data with Pig or MapReduce code. CREATE EXTERNAL TABLE IF NOT EXISTS historic_weather(column 1, column2) PARTITIONED BY (yr string) STORED AS ... LOCATION '/user/oracle/weather/historic'; As new weather data comes in from NCDC, I'll need to add partitions to my table. That's an action I should put in the workflow. Similarly, the weather data requires parsing in order to be useful as a set of columns. Because of their long history, the weather data is broken up into fields of specific byte lengths: x bytes for the station ID, y bytes for the dew point, and so on. The delimiting is consistent from year to year, so writing SerDe or a parser for transformation is simple. Once that's done, I want to select columns on which to train, classify certain features, and place the training data in an HDFS directory for my Pig script to access. ALTER TABLE historic_weather ADD IF NOT EXISTS PARTITION (yr='2010') LOCATION '/user/oracle/weather/historic/yr=2011'; INSERT OVERWRITE DIRECTORY '/user/oracle/weather/cleaned_history' SELECT w.stn, w.wban, w.weather_year, w.weather_month, w.weather_day, w.temp, w.dewp, w.weather FROM ( FROM historic_weather SELECT TRANSFORM(...) USING '/path/to/hive/filters/ncdc_parser.py' as stn, wban, weather_year, weather_month, weather_day, temp, dewp, weather ) w; Since I'm going to prepare training directories with at least the same frequency that I add partitions, I should also add that to my workflow. Oozie is going to invoke these Hive actions using what's somewhat obviously referred to as a Hive action. Hive actions amount to Oozie running a script file containing our query language statements, so we can place them in a file called weather_train.hql. Starting Our Workflow Oozie offers two types of jobs: workflows and coordinator jobs. Workflows are straightforward: they define a set of actions to perform as a sequence or directed acyclic graph. Coordinator jobs can take all the same actions of Workflow jobs, but they can be automatically started either periodically or when new data arrives in a specified location. To keep things simple we'll make a workflow job; coordinator jobs simply require another XML file for scheduling. The bare minimum for workflow XML defines a name, a starting point, and an end point: <workflow-app name="WeatherMan" xmlns="uri:oozie:workflow:0.1"> <start to="ParseNCDCData"/> <end name="end"/> </workflow-app> To this we need to add an action, and within that we'll specify the hive parameters Also, keep in mind that actions require <ok> and <error> tags to direct the next action on success or failure. <action name="ParseNCDCData"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <configuration> <property> <name>oozie.hive.defaults</name> <value>/user/oracle/weather_ooze/hive-default.xml</value> </property> </configuration> <script>ncdc_parse.hql</script> </hive> <ok to="WeatherMan"/> <error to="end"/> </action> There are a couple of things to note here: I have to give the FQDN (or IP) and port of my JobTracker and NameNode. I have to include a hive-default.xml file. I have to include a script file. The hive-default.xml and script file must be stored in HDFS That last point is particularly important. Oozie doesn't make assumptions about where a given workflow is being run. You might submit workflows against different clusters, or have different hive-defaults.xml on different clusters (e.g. MySQL or Postgres-backed metastores). A quick way to ensure that all the assets end up in the right place in HDFS is just to make a working directory locally, build your workflow.xml in it, and copy the assets you'll need to it as you add actions to workflow.xml. At this point, our local directory should contain: workflow.xml hive-defaults.xml (make sure this file contains your metastore connection data) ncdc_parse.hql Adding Pig to the Ooze Adding our Pig script as an action is slightly simpler from an XML standpoint. All we do is add an action to workflow.xml as follows: <action name="WeatherMan"> <pig> <job-tracker>localhost:8021</job-tracker> <name-node>localhost:8020</name-node> <script>weather_train.pig</script> </pig> <ok to="end"/> <error to="end"/> </action> Once we've done this, we'll copy weather_train.pig to our working directory. However, there's a bit of a "gotcha" here. My pig script registers the Weka Jar and a chunk of jython. If those aren't also in HDFS, our action will fail from the outset -- but where do we put them? The Jython script goes into the working directory at the same level as the pig script, because pig attempts to load Jython files in the directory from which the script executes. However, that's not where our Weka jar goes. While Oozie doesn't assume much, it does make an assumption about the Pig classpath. Anything under working_directory/lib gets automatically added to the Pig classpath and no longer requires a REGISTER statement in the script. Anything that uses a REGISTER statement cannot be in the working_directory/lib directory. Instead, it needs to be in a different HDFS directory and attached to the pig action with an <archive> tag. Yes, that's as confusing as you think it is. You can get the exact rules for adding Jars to the distributed cache from Oozie's Pig Cookbook. Making the Workflow Work We've got a workflow defined and have collected all the components we'll need to run. But we can't run anything yet, because we still have to define some properties about the job and submit it to Oozie. We need to start with the job properties, as this is essentially the "request" we'll submit to the Oozie server. In the same working directory, we'll make a file called job.properties as follows: nameNode=hdfs://localhost:8020 jobTracker=localhost:8021 queueName=default weatherRoot=weather_ooze mapreduce.jobtracker.kerberos.principal=foo dfs.namenode.kerberos.principal=foo oozie.libpath=${nameNode}/user/oozie/share/lib oozie.wf.application.path=${nameNode}/user/${user.name}/${weatherRoot} outputDir=weather-ooze While some of the pieces of the properties file are familiar (e.g., JobTracker address), others take a bit of explaining. The first is weatherRoot: this is essentially an environment variable for the script (as are jobTracker and queueName). We're simply using them to simplify the directives for the Oozie job. The oozie.libpath pieces is extremely important. This is a directory in HDFS which holds Oozie's shared libraries: a collection of Jars necessary for invoking Hive, Pig, and other actions. It's a good idea to make sure this has been installed and copied up to HDFS. The last two lines are straightforward: run the application defined by workflow.xml at the application path listed and write the output to the output directory. We're finally ready to submit our job! After all that work we only need to do a few more things: Validate our workflow.xml Copy our working directory to HDFS Submit our job to the Oozie server Run our workflow Let's do them in order. First validate the workflow: oozie validate workflow.xml Next, copy the working directory up to HDFS: hadoop fs -put working_dir /user/oracle/working_dir Now we submit the job to the Oozie server. We need to ensure that we've got the correct URL for the Oozie server, and we need to specify our job.properties file as an argument. oozie job -oozie http://url.to.oozie.server:port_number/ -config /path/to/working_dir/job.properties -submit We've submitted the job, but we don't see any activity on the JobTracker? All I got was this funny bit of output: 14-20120525161321-oozie-oracle This is because submitting a job to Oozie creates an entry for the job and places it in PREP status. What we got back, in essence, is a ticket for our workflow to ride the Oozie train. We're responsible for redeeming our ticket and running the job. oozie -oozie http://url.to.oozie.server:port_number/ -start 14-20120525161321-oozie-oracle Of course, if we really want to run the job from the outset, we can change the "-submit" argument above to "-run." This will prep and run the workflow immediately. Takeaway So, there you have it: the somewhat laborious process of building an Oozie workflow. It's a bit tedious the first time out, but it does present a pair of real benefits to those of us who spend a great deal of time data munging. First, when new data arrives that requires the same processing, we already have the workflow defined and ready to run. Second, as we build up a set of useful action definitions over time, creating new workflows becomes quicker and quicker.

Read the article

Extending Oracle CEP with Predictive Analytics

- by vikram.shukla(at)oracle.com

Introduction: OCEP is often used as a business rules engine to execute a set of business logic rules via CQL statements, and take decisions based on the outcome of those rules. There are times where configuring rules manually is sufficient because an application needs to deal with only a small and well-defined set of static rules. However, in many situations customers don't want to pre-define such rules for two reasons. First, they are dealing with events with lots of columns and manually crafting such rules for each column or a set of columns and combinations thereof is almost impossible. Second, they are content with probabilistic outcomes and do not care about 100% precision. The former is the case when a user is dealing with data with high dimensionality, the latter when an application can live with "false" positives as they can be discarded after further inspection, say by a Human Task component in a Business Process Management software. The primary goal of this blog post is to show how this can be achieved by combining OCEP with Oracle Data Mining® and leveraging the latter's rich set of algorithms and functionality to do predictive analytics in real time on streaming events. The secondary goal of this post is also to show how OCEP can be extended to invoke any arbitrary external computation in an RDBMS from within CEP. The extensible facility is known as the JDBC cartridge. The rest of the post describes the steps required to achieve this: We use the dataset available at http://blogs.oracle.com/datamining/2010/01/fraud_and_anomaly_detection_made_simple.html to showcase the capabilities. We use it to show how transaction anomalies or fraud can be detected. Building the model: Follow the self-explanatory steps described at the above URL to build the model. It is very simple - it uses built-in Oracle Data Mining PL/SQL packages to cleanse, normalize and build the model out of the dataset. You can also use graphical Oracle Data Miner® to build the models. To summarize, it involves: Specifying which algorithms to use. In this case we use Support Vector Machines as we're trying to find anomalies in highly dimensional dataset.Build model on the data in the table for the algorithms specified. For this example, the table was populated in the scott/tiger schema with appropriate privileges. Configuring the Data Source: This is the first step in building CEP application using such an integration. Our datasource looks as follows in the server config file. It is advisable that you use the Visualizer to add it to the running server dynamically, rather than manually edit the file. <data-source> <name>DataMining</name> <data-source-params> <jndi-names> <element>DataMining</element> </jndi-names> <global-transactions-protocol>OnePhaseCommit</global-transactions-protocol> </data-source-params> <connection-pool-params> <credential-mapping-enabled></credential-mapping-enabled> <test-table-name>SQL SELECT 1 from DUAL</test-table-name> <initial-capacity>1</initial-capacity> <max-capacity>15</max-capacity> <capacity-increment>1</capacity-increment> </connection-pool-params> <driver-params> <use-xa-data-source-interface>true</use-xa-data-source-interface> <driver-name>oracle.jdbc.OracleDriver</driver-name> <url>jdbc:oracle:thin:@localhost:1522:orcl</url> <properties> <element> <value>scott</value> <name>user</name> </element> <element> <value>{Salted-3DES}AzFE5dDbO2g=</value> <name>password</name> </element> <element> <name>com.bea.core.datasource.serviceName</name> <value>oracle11.2g</value> </element> <element> <name>com.bea.core.datasource.serviceVersion</name> <value>11.2.0</value> </element> <element> <name>com.bea.core.datasource.serviceObjectClass</name> <value>java.sql.Driver</value> </element> </properties> </driver-params> </data-source> Designing the EPN: The EPN is very simple in this example. We briefly describe each of the components. The adapter ("DataMiningAdapter") reads data from a .csv file and sends it to the CQL processor downstream. The event payload here is same as that of the table in the database (refer to the attached project or do a "desc table-name" from a SQL*PLUS prompt). While this is for convenience in this example, it need not be the case. One can still omit fields in the streaming events, and need not match all columns in the table on which the model was built. Better yet, it does not even need to have the same name as columns in the table, as long as you alias them in the USING clause of the mining function. (Caveat: they still need to draw values from a similar universe or domain, otherwise it constitutes incorrect usage of the model). There are two things in the CQL processor ("DataMiningProc") that make scoring possible on streaming events. 1. User defined cartridge function Please refer to the OCEP CQL reference manual to find more details about how to define such functions. We include the function below in its entirety for illustration. <?xml version="1.0" encoding="UTF-8"?> <jdbcctxconfig:config xmlns:jdbcctxconfig="http://www.bea.com/ns/wlevs/config/application" xmlns:jc="http://www.oracle.com/ns/ocep/config/jdbc"> <jc:jdbc-ctx> <name>Oracle11gR2</name> <data-source>DataMining</data-source> <function name="prediction2"> <param name="CQLMONTH" type="char"/> <param name="WEEKOFMONTH" type="int"/> <param name="DAYOFWEEK" type="char" /> <param name="MAKE" type="char" /> <param name="ACCIDENTAREA" type="char" /> <param name="DAYOFWEEKCLAIMED" type="char" /> <param name="MONTHCLAIMED" type="char" /> <param name="WEEKOFMONTHCLAIMED" type="int" /> <param name="SEX" type="char" /> <param name="MARITALSTATUS" type="char" /> <param name="AGE" type="int" /> <param name="FAULT" type="char" /> <param name="POLICYTYPE" type="char" /> <param name="VEHICLECATEGORY" type="char" /> <param name="VEHICLEPRICE" type="char" /> <param name="FRAUDFOUND" type="int" /> <param name="POLICYNUMBER" type="int" /> <param name="REPNUMBER" type="int" /> <param name="DEDUCTIBLE" type="int" /> <param name="DRIVERRATING" type="int" /> <param name="DAYSPOLICYACCIDENT" type="char" /> <param name="DAYSPOLICYCLAIM" type="char" /> <param name="PASTNUMOFCLAIMS" type="char" /> <param name="AGEOFVEHICLES" type="char" /> <param name="AGEOFPOLICYHOLDER" type="char" /> <param name="POLICEREPORTFILED" type="char" /> <param name="WITNESSPRESNT" type="char" /> <param name="AGENTTYPE" type="char" /> <param name="NUMOFSUPP" type="char" /> <param name="ADDRCHGCLAIM" type="char" /> <param name="NUMOFCARS" type="char" /> <param name="CQLYEAR" type="int" /> <param name="BASEPOLICY" type="char" /> <return-component-type>char</return-component-type> <sql><![CDATA[ SELECT to_char(PREDICTION_PROBABILITY(CLAIMSMODEL, '0' USING *)) AS probability FROM (SELECT :CQLMONTH AS MONTH, :WEEKOFMONTH AS WEEKOFMONTH, :DAYOFWEEK AS DAYOFWEEK, :MAKE AS MAKE, :ACCIDENTAREA AS ACCIDENTAREA, :DAYOFWEEKCLAIMED AS DAYOFWEEKCLAIMED, :MONTHCLAIMED AS MONTHCLAIMED, :WEEKOFMONTHCLAIMED, :SEX AS SEX, :MARITALSTATUS AS MARITALSTATUS, :AGE AS AGE, :FAULT AS FAULT, :POLICYTYPE AS POLICYTYPE, :VEHICLECATEGORY AS VEHICLECATEGORY, :VEHICLEPRICE AS VEHICLEPRICE, :FRAUDFOUND AS FRAUDFOUND, :POLICYNUMBER AS POLICYNUMBER, :REPNUMBER AS REPNUMBER, :DEDUCTIBLE AS DEDUCTIBLE, :DRIVERRATING AS DRIVERRATING, :DAYSPOLICYACCIDENT AS DAYSPOLICYACCIDENT, :DAYSPOLICYCLAIM AS DAYSPOLICYCLAIM, :PASTNUMOFCLAIMS AS PASTNUMOFCLAIMS, :AGEOFVEHICLES AS AGEOFVEHICLES, :AGEOFPOLICYHOLDER AS AGEOFPOLICYHOLDER, :POLICEREPORTFILED AS POLICEREPORTFILED, :WITNESSPRESNT AS WITNESSPRESENT, :AGENTTYPE AS AGENTTYPE, :NUMOFSUPP AS NUMOFSUPP, :ADDRCHGCLAIM AS ADDRCHGCLAIM, :NUMOFCARS AS NUMOFCARS, :CQLYEAR AS YEAR, :BASEPOLICY AS BASEPOLICY FROM dual) ]]> </sql> </function> </jc:jdbc-ctx> </jdbcctxconfig:config> 2. Invoking the function for each event. Once this function is defined, you can invoke it from CQL as follows: <?xml version="1.0" encoding="UTF-8"?> <wlevs:config xmlns:wlevs="http://www.bea.com/ns/wlevs/config/application"> <processor> <name>DataMiningProc</name> <rules> <query id="q1"><![CDATA[ ISTREAM(SELECT S.CQLMONTH, S.WEEKOFMONTH, S.DAYOFWEEK, S.MAKE, : S.BASEPOLICY, C.F AS probability FROM StreamDataChannel [NOW] AS S, TABLE(prediction2@Oracle11gR2(S.CQLMONTH, S.WEEKOFMONTH, S.DAYOFWEEK, S.MAKE, ..., S.BASEPOLICY) AS F of char) AS C) ]]></query> </rules> </processor> </wlevs:config> Finally, the last stage in the EPN prints out the probability of the event being an anomaly. One can also define a threshold in CQL to filter out events that are normal, i.e., below a certain mark as defined by the analyst or designer. Sample Runs: Now let's see how this behaves when events are streamed through CEP. We use only two events for brevity, one normal and other one not. This is one of the "normal" looking events and the probability of it being anomalous is less than 60%. Event is: eventType=DataMiningOutEvent object=q1 time=2904821976256 S.CQLMONTH=Dec, S.WEEKOFMONTH=5, S.DAYOFWEEK=Wednesday, S.MAKE=Honda, S.ACCIDENTAREA=Urban, S.DAYOFWEEKCLAIMED=Tuesday, S.MONTHCLAIMED=Jan, S.WEEKOFMONTHCLAIMED=1, S.SEX=Female, S.MARITALSTATUS=Single, S.AGE=21, S.FAULT=Policy Holder, S.POLICYTYPE=Sport - Liability, S.VEHICLECATEGORY=Sport, S.VEHICLEPRICE=more than 69000, S.FRAUDFOUND=0, S.POLICYNUMBER=1, S.REPNUMBER=12, S.DEDUCTIBLE=300, S.DRIVERRATING=1, S.DAYSPOLICYACCIDENT=more than 30, S.DAYSPOLICYCLAIM=more than 30, S.PASTNUMOFCLAIMS=none, S.AGEOFVEHICLES=3 years, S.AGEOFPOLICYHOLDER=26 to 30, S.POLICEREPORTFILED=No, S.WITNESSPRESENT=No, S.AGENTTYPE=External, S.NUMOFSUPP=none, S.ADDRCHGCLAIM=1 year, S.NUMOFCARS=3 to 4, S.CQLYEAR=1994, S.BASEPOLICY=Liability, probability=.58931702982118561 isTotalOrderGuarantee=true\nAnamoly probability: .58931702982118561 However, the following event is scored as an anomaly with a very high probability of 89%. So there is likely to be something wrong with it. A close look reveals that the value of "deductible" field (10000) is not "normal". What exactly constitutes normal here?. If you run the query on the database to find ALL distinct values for the "deductible" field, it returns the following set: {300, 400, 500, 700} Event is: eventType=DataMiningOutEvent object=q1 time=2598483773496 S.CQLMONTH=Dec, S.WEEKOFMONTH=5, S.DAYOFWEEK=Wednesday, S.MAKE=Honda, S.ACCIDENTAREA=Urban, S.DAYOFWEEKCLAIMED=Tuesday, S.MONTHCLAIMED=Jan, S.WEEKOFMONTHCLAIMED=1, S.SEX=Female, S.MARITALSTATUS=Single, S.AGE=21, S.FAULT=Policy Holder, S.POLICYTYPE=Sport - Liability, S.VEHICLECATEGORY=Sport, S.VEHICLEPRICE=more than 69000, S.FRAUDFOUND=0, S.POLICYNUMBER=1, S.REPNUMBER=12, S.DEDUCTIBLE=10000, S.DRIVERRATING=1, S.DAYSPOLICYACCIDENT=more than 30, S.DAYSPOLICYCLAIM=more than 30, S.PASTNUMOFCLAIMS=none, S.AGEOFVEHICLES=3 years, S.AGEOFPOLICYHOLDER=26 to 30, S.POLICEREPORTFILED=No, S.WITNESSPRESENT=No, S.AGENTTYPE=External, S.NUMOFSUPP=none, S.ADDRCHGCLAIM=1 year, S.NUMOFCARS=3 to 4, S.CQLYEAR=1994, S.BASEPOLICY=Liability, probability=.89171554529576691 isTotalOrderGuarantee=true\nAnamoly probability: .89171554529576691 Conclusion: By way of this example, we show: real-time scoring of events as they flow through CEP leveraging Oracle Data Mining.how CEP applications can invoke complex arbitrary external computations (function shipping) in an RDBMS.

Read the article

Let's introduce the Oracle Enterprise Data Quality family!

- by Sarah Zanchetti

The Oracle Enterprise Data Quality family of products helps you to achieve maximum value from their business applications by delivering fit-for-purpose data. OEDQ is a state-of-the-art collaborative data quality profiling, analysis, parsing, standardization, matching and merging product, designed to help you understand, improve, protect and govern the quality of the information your business uses, all from a single integrated environment. Oracle Enterprise Data Quality products are: Oracle Enterprise Data Quality Profile and Audit Oracle Enterprise Data Quality Parsing and Standardization Oracle Enterprise Data Quality Match and Merge Oracle Enterprise Data Quality Address Verification Server Oracle Enterprise Data Quality Product Data Parsing and Standardization Oracle Enterprise Data Quality Product Data Match and Merge Also, the following are some of the key features of OEDQ: Integrated data profiling, auditing, cleansing and matching Browser-based client access Ability to handle all types of data – for example customer, product, asset, financial, operational Connection to any JDBC-compliant data sources and targets Multi-user project support (role-based access, issue tracking, process annotation, and version control) Services Oriented Architecture (SOA) - support for designing processes that may be exposed to external applications as a service Designed to process large data volumes A single repository to hold data along with gathered statistics and project tracking information, with shared access Intuitive graphical user interface designed to help you solve real-world information quality issues quickly Easy, data-led creation and extension of validation and transformation rules Fully extensible architecture allowing the insertion of any required custom processing If you need to learn more about EDQ, or get assistance for any kind of issue, the Oracle Technology Network offers a huge range of resources on Oracle software. Discuss technical problems and solutions on the Discussion Forums. Get hands-on step-by-step tutorials with Oracle By Example. Download Sample Code. Get the latest news and information on any Oracle product. You can also get further help and information with Oracle software from: My Oracle Support Oracle Support Services An Information Center is available, where you can find technical information and fast solutions to the most common already solved issues: Information Center: Oracle Enterprise Data Quality [ID 1555073.2]

Read the article

Big Visible Charts

- by Robert May

An important part of Agile is the concept of transparency and visibility. In proper functioning teams, stakeholders can look at any team at any time in the iteration or release and see how that team is doing by simply looking at what we call Big Visible Charts. If you’ve done Scrum, you’ve seen these charts. However, interpreting these charts can often be an art form. There are several different charts that can be useful. In this newsletter, I’ll focus on the Iteration Burndown and Cumulative Flow charts. I’ve included a copy of the spreadsheet that I used to create the charts, and if you don’t have a tool that creates them for you, you can use this spreadsheet to do so. Our preferred tool for managing Scrum projects is Rally. Rally creates all of these charts for you, saving you quite a bit of time. The Iteration Burndown and Cumulative Flow Charts This is the main chart that teams use. Although less useful to stakeholders, this chart is critical to the team and provides quite a bit of information to the team about how their iteration is going. Most charts are a combination of the charts below, so you may need to combine aspects of each section to understand what is happening in your iterations. Ideal Ah, isn’t that a pretty picture? Unfortunately, it’s also very unrealistic. I’ve seen iterations that come close to ideal, but never that match perfectly. If your iteration matches perfectly, chances are, someone is playing with the numbers. Reality is just too difficult to have a burndown chart that matches this exactly. Late Planning Iteration started, but the team didn’t. You can tell this by the fact that the real number of estimated hours didn’t appear until day two. In the cumulative flow, you can also see that nothing was defined in Day one and two. You want to avoid situations like this. You’ll note that the team had to burn faster than is ideal to meet the iteration because of the late planning. This often results in long weeks and days. Testing Starved Determining whether or not testing is starved is difficult without the cumulative flow. The pattern in the burndown could be nothing more that developers not completing stories early enough or could be caused by stories being too big. With the cumulative flow, however, you see that only small bites are in progress and stories were completed early, but testing didn’t start testing until the end of the iteration, and didn’t complete testing all stories in the iteration. When this happens, question whether or not your testing resources are sufficient for your team and whether or not acceptance is adequately defined. No Testing With this one, both graphs show the same thing; the team needs testers and testing! Without testing, what was completed cannot be verified to make sure that it is acceptable to the business. If you find yourself in this situation, review your testing practices and acceptance testing process and make changes today. Late Development With this situation, both graphs tell a story. In the top graph, you can see that the hours failed to burn down as quickly as the team expected. This could be caused by the team not correctly estimating their hours or the team could have had illness or some other issue that affected them. Often, when teams are tackling something that is more unknown, they’ll run into technical barriers that cause the burn down to happen slower than expected. In the cumulative flow graph, you can see that not much was completed in the first few days. This could be because of illness or technical barriers or simply poor estimation. Testing was able to keep up with everything that was completed, however. No Tool Updating When you see graphs that look like this, you can be assured that it’s because the team is not updating the tool that generates the graphs. Review your policy for when they are to update. On the teams that I run, I require that each team member updates the tool at least once daily. You should also check to see how well the team is breaking down stories into tasks. If they’re creating few large tasks, graphs can look similar to this. As a general rule, I never allow tasks, other than Unit Testing and Uncertainty, to be greater than eight hours in duration. Scope Increase I always encourage team members to enter in however much time they think they have left on a task, even if that means increasing the total amount of time left to do. You get a much better and more realistic picture this way. Increasing time remaining could explain the burndown graph, but by looking at the cumulative flow graph, we can see that stories were added to the iteration and scope was increased. Since planning should consume all of the hours in the iteration, this is almost always a bad thing. If the scope change happened late in the iteration and the hours remaining were well below the ideal burn, then increasing scope is probably o.k., but estimation needs to get better. However, with the charts above, that’s clearly not what happened and the team was required to do extra work to make the iteration. If you find this happening, your product owner and ScrumMasters need training. The team also needs to learn to say no. Scope Decrease Scope decreases are just as bad as scope increases. Usually, graphs above show that the team did a poor job of estimating their stories and part way through had to reduce scope to change the iteration. This will happen once in a while, but if you find it’s a pattern on your team, you need to re-evaluate planning. Some teams are hopelessly optimistic. In those cases, I’ll introduce a task I call “Uncertainty.” With Uncertainty, the team estimates how many hours they might need if things don’t go well with the tasks they’ve defined. They try to estimate things that could go poorly and increase the time appropriately. Having an Uncertainty task allows them to have a low and high estimate. Uncertainty should not just be an arbitrary buffer. It must correlate to real uncertainty in the tasks that have been defined. Stories are too Big Often, we see graphs like the ones above. Note that the burndown looks fairly good, other than the chunky acceptance of stories. However, when you look at cumulative flow, you can see that at one point, everything is in progress. This is a bad thing. When you see graphs like this, you’re in one of two states. You may just have a very small team and can only handle one or two stories in your iteration. If you have more than one or two people, then the most likely problem is that your stories are far too big. To combat this, break large high hour stories into smaller pieces that can be completed independently and accepted independently. If you don’t, you’ll likely be requiring your testers to do heroic things to complete testing on the last day of the iteration and you’re much more likely to have the entire iteration fail, because of the limited amount of things that can be completed. Summary There are other charts that can be useful when doing scrum. If you don’t have any big visible charts, you really need to evaluate your process and change. These charts can provide the team a wealth of information and help you write better software. If you have any questions about charts that you’re seeing on your team, contact me with a screen capture of the charts and I’ll tell you what I’m seeing in those charts. I always want this information to be useful, so please let me know if you have other questions. Technorati Tags: Agile

Search Results

Search found 66013 results on 2641 pages for 'big data analytics'.

Page 41/2641 | < Previous Page | 37 38 39 40 41 42 43 44 45 46 47 48 | Next Page >

- by Rob

- by PerryCS

- by Gino Abraham

- by hinkmond

- by Kellsey Ruppel

- by Codenotguru

- by Adam Tannon

- by Jeff Atwood

- by DrJohn

- by calumbrodie

- by David Dorf

- by HCM-Oracle

- by p.anda

- by Jeffrey McDaniel

- by maria costanzo

- by Cdore

- by andlju

- by andlju

- by Ni

- by Neir0

- by Alexander

- by dan.mcclary

- by vikram.shukla(at)oracle.com

- by Sarah Zanchetti

- by Robert May

< Previous Page | 37 38 39 40 41 42 43 44 45 46 47 48 | Next Page >