Search Results

Search found 59554 results on 2383 pages for 'distributed data mining'.

Page 5/2383 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Resources related to data-mining and gaming on social networks

    - by darren
    Hi all I'm interested in the problem of patterning mining among players of social networking games. For example detecting cheaters of a game, given a company's user database. So far I have been following the usual recipe for a data mining project: construct a data warehouse that aggregates significant information select a classifier, and train it with a subsectio of records from the warehouse validate classifier with another test set lather, rinse, repeat Surprisingly, I've found very little in this area regarding literature, best practices, etc. I am hoping to crowdsource the information gathering problem here. Specifically what I'm looking for: What classifiers have worked will for this type of pattern mining (it seems highly temporal, users playing games, users receiving rewards, users transferring prizes etc). Are there any highly agreed upon attributes specific to social networking / gaming data? What is a practical amount of information that should be considered? One problem I've run into is data overload, where queries and data cleansing may take days to complete. Related to point above, what hardware resources are required to produce results? I've found it difficult to estimate the amount of computing power I will require for production use. It has become apparent that a white box in the corner does not have enough horse-power for such a project. Are companies generally resorting to cloud solutions? Are they buying clusters? Basically, any resources (theoretical, academic, or practical) about implementing a social networking / gaming pattern-mining program would be very much appreciated. Thanks.

    Read the article

  • Building a Redundant / Distrubuted Application

    - by MattW
    This is more of a "point me in the right direction" question. I (and my team of 3) have built a hosted web app that queues and routes customer chat requests to available customer service agents (It does other things as well, but this is enough background to illustrate the issue). The basic dev architecture today is: a single page ajax web UI (ASP.NET MVC) with floating chat windows (think Gmail) a backend Windows service to queue and route the chat requests this service also logs the chats, calculates service levels, etc a Comet server product that routes data between the web frontend and the backend Windows service this also helps us detect which Agents are still connected (online) And our hardware architecture today is: 2 servers to host the web UI portion of the application a load balancer to route requests to the 2 different web app servers a third server to host the SQL Server DB and the backend Windows service responsible for queuing / delivering chats So as it stands today, one of the web app servers could go down and we would be ok. However, if something would happen to the SQL Server / Windows Service server we would be boned. My question - how can I make this backend Windows service logic be able to be spread across multiple machines (distributed)? The Windows service is written to accept requests from the Comet server, check for available Agents, and route the chat to those agents. How can I make this more distributed? How can I make it so that I can distribute the work of the backend Windows service can be spread across multiple machines for redundancy and uptime purposes? Will I need to re-write it with distributed computing in mind? I should also note that I am hosting all of this on Rackspace Cloud instances - so maybe it is something I should be less concerned about? Thanks in advance for any help!

    Read the article

  • SQL SERVER – Guest Post – Architecting Data Warehouse – Niraj Bhatt

    - by pinaldave
    Niraj Bhatt works as an Enterprise Architect for a Fortune 500 company and has an innate passion for building / studying software systems. He is a top rated speaker at various technical forums including Tech·Ed, MCT Summit, Developer Summit, and Virtual Tech Days, among others. Having run a successful startup for four years Niraj enjoys working on – IT innovations that can impact an enterprise bottom line, streamlining IT budgets through IT consolidation, architecture and integration of systems, performance tuning, and review of enterprise applications. He has received Microsoft MVP award for ASP.NET, Connected Systems and most recently on Windows Azure. When he is away from his laptop, you will find him taking deep dives in automobiles, pottery, rafting, photography, cooking and financial statements though not necessarily in that order. He is also a manager/speaker at BDOTNET, Asia’s largest .NET user group. Here is the guest post by Niraj Bhatt. As data in your applications grows it’s the database that usually becomes a bottleneck. It’s hard to scale a relational DB and the preferred approach for large scale applications is to create separate databases for writes and reads. These databases are referred as transactional database and reporting database. Though there are tools / techniques which can allow you to create snapshot of your transactional database for reporting purpose, sometimes they don’t quite fit the reporting requirements of an enterprise. These requirements typically are data analytics, effective schema (for an Information worker to self-service herself), historical data, better performance (flat data, no joins) etc. This is where a need for data warehouse or an OLAP system arises. A Key point to remember is a data warehouse is mostly a relational database. It’s built on top of same concepts like Tables, Rows, Columns, Primary keys, Foreign Keys, etc. Before we talk about how data warehouses are typically structured let’s understand key components that can create a data flow between OLTP systems and OLAP systems. There are 3 major areas to it: a) OLTP system should be capable of tracking its changes as all these changes should go back to data warehouse for historical recording. For e.g. if an OLTP transaction moves a customer from silver to gold category, OLTP system needs to ensure that this change is tracked and send to data warehouse for reporting purpose. A report in context could be how many customers divided by geographies moved from sliver to gold category. In data warehouse terminology this process is called Change Data Capture. There are quite a few systems that leverage database triggers to move these changes to corresponding tracking tables. There are also out of box features provided by some databases e.g. SQL Server 2008 offers Change Data Capture and Change Tracking for addressing such requirements. b) After we make the OLTP system capable of tracking its changes we need to provision a batch process that can run periodically and takes these changes from OLTP system and dump them into data warehouse. There are many tools out there that can help you fill this gap – SQL Server Integration Services happens to be one of them. c) So we have an OLTP system that knows how to track its changes, we have jobs that run periodically to move these changes to warehouse. The question though remains is how warehouse will record these changes? This structural change in data warehouse arena is often covered under something called Slowly Changing Dimension (SCD). While we will talk about dimensions in a while, SCD can be applied to pure relational tables too. SCD enables a database structure to capture historical data. This would create multiple records for a given entity in relational database and data warehouses prefer having their own primary key, often known as surrogate key. As I mentioned a data warehouse is just a relational database but industry often attributes a specific schema style to data warehouses. These styles are Star Schema or Snowflake Schema. The motivation behind these styles is to create a flat database structure (as opposed to normalized one), which is easy to understand / use, easy to query and easy to slice / dice. Star schema is a database structure made up of dimensions and facts. Facts are generally the numbers (sales, quantity, etc.) that you want to slice and dice. Fact tables have these numbers and have references (foreign keys) to set of tables that provide context around those facts. E.g. if you have recorded 10,000 USD as sales that number would go in a sales fact table and could have foreign keys attached to it that refers to the sales agent responsible for sale and to time table which contains the dates between which that sale was made. These agent and time tables are called dimensions which provide context to the numbers stored in fact tables. This schema structure of fact being at center surrounded by dimensions is called Star schema. A similar structure with difference of dimension tables being normalized is called a Snowflake schema. This relational structure of facts and dimensions serves as an input for another analysis structure called Cube. Though physically Cube is a special structure supported by commercial databases like SQL Server Analysis Services, logically it’s a multidimensional structure where dimensions define the sides of cube and facts define the content. Facts are often called as Measures inside a cube. Dimensions often tend to form a hierarchy. E.g. Product may be broken into categories and categories in turn to individual items. Category and Items are often referred as Levels and their constituents as Members with their overall structure called as Hierarchy. Measures are rolled up as per dimensional hierarchy. These rolled up measures are called Aggregates. Now this may seem like an overwhelming vocabulary to deal with but don’t worry it will sink in as you start working with Cubes and others. Let’s see few other terms that we would run into while talking about data warehouses. ODS or an Operational Data Store is a frequently misused term. There would be few users in your organization that want to report on most current data and can’t afford to miss a single transaction for their report. Then there is another set of users that typically don’t care how current the data is. Mostly senior level executives who are interesting in trending, mining, forecasting, strategizing, etc. don’t care for that one specific transaction. This is where an ODS can come in handy. ODS can use the same star schema and the OLAP cubes we saw earlier. The only difference is that the data inside an ODS would be short lived, i.e. for few months and ODS would sync with OLTP system every few minutes. Data warehouse can periodically sync with ODS either daily or weekly depending on business drivers. Data marts are another frequently talked about topic in data warehousing. They are subject-specific data warehouse. Data warehouses that try to span over an enterprise are normally too big to scope, build, manage, track, etc. Hence they are often scaled down to something called Data mart that supports a specific segment of business like sales, marketing, or support. Data marts too, are often designed using star schema model discussed earlier. Industry is divided when it comes to use of data marts. Some experts prefer having data marts along with a central data warehouse. Data warehouse here acts as information staging and distribution hub with spokes being data marts connected via data feeds serving summarized data. Others eliminate the need for a centralized data warehouse citing that most users want to report on detailed data. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Best Practices, Business Intelligence, Data Warehousing, Database, Pinal Dave, PostADay, Readers Contribution, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • Master Data Services Employees Sample Model

    - by Davide Mauri
    I’ve been playing with Master Data Services quite a lot in those last days and I’m also monitoring the web for all available resources on it. Today I’ve found this freshly released sample available on MSDN Code Gallery: SQL Server Master Data Services Employee Sample Model http://code.msdn.microsoft.com/SSMDSEmployeeSample This sample shows how Recursive Hierarchies can be modeled in order to represent a typical organizational chart scenario where a self-relationship exists on the Employee entity. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

    Read the article

  • Looking for Cutting-Edge Data Integration: 2010 Innovation Awards

    - by dain.hansen
    This year's Oracle Fusion Middleware Innovation Awards will honor customers and partners who are creatively using to various products across Oracle Fusion Middleware. Brand new to this year's awards is a category for Data Integration. Think you have something unique and innovative with one of our Oracle Data Integration products? We'd love to hear from you! Please submit today The deadline for the nomination is 5 p.m. PT Friday, August 6th 2010, and winning organizations will be notified by late August 2010. What you win! FREE pass to Oracle OpenWorld 2010 in San Francisco for select winners in each category. Honored by Oracle executives at awards ceremony held during Oracle OpenWorld 2010 in San Francisco. Oracle Middleware Innovation Award Winner Plaque 1-3 meetings with Oracle Executives during Oracle OpenWorld 2010 Feature article placement in Oracle Magazine and placement in Oracle Press Release Customer snapshot and video testimonial opportunity, to be hosted on oracle.com Podcast interview opportunity with Senior Oracle Executive

    Read the article

  • Data Integration 12c Raising the Big Data Roof at Oracle OpenWorld

    - by Tanu Sood
    Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Times New Roman","serif"; mso-fareast-font-family:"MS Mincho";} Author: Dain Hansen, Director, Oracle It was an exciting OpenWorld 2013 for us in the Data Integration track. Our theme this year was all about ‘being future ready’ - previewing one of our biggest releases this year: Oracle Data Integration 12c. Just this week we followed up with this preview by announcing the general availability of 12c release for Oracle’s key data integration products: Oracle Data Integrator 12c and Oracle GoldenGate 12c. The new release delivers extreme performance, increase IT productivity, and simplify deployment, while helping IT organizations to keep pace with new data-oriented technology trends including cloud computing, big data analytics, real-time business intelligence. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Times New Roman","serif"; mso-fareast-font-family:"MS Mincho";} Mark Hurd's keynote on day one set the tone for the Data Integration sessions. Mark focused on big data analytics and the changing consumer expectations. Especially real-time insight is a key theme for Oracle overall and data integration products. In Mark Hurd's keynote we heard from key customers, such as Airbus and Thomson Reuters, how real-time analysis of operational data including machine data creates value, in some cases even saves lives. Thomas Kurian gave a deeper look into Oracle's big data and fast data solutions. In the initial lead Data Integration track session - Brad Adelberg, VP of Development, presented Oracle’s Data Integration 12c product strategy based on key trends from the initial OpenWorld keynotes. Brad talked about how Oracle's data integration products address the new data integration requirements that evolved with cloud computing, big data, and changing consumer expectations and how they set the key themes in our products’ road map. Brad explained why and how fast-time to value, high-performance and future-ready solutions is the top focus areas for product development. If you were not able to attend OpenWorld or this session I recommend reading the white paper: Five New Data Integration Requirements and How to Meet them with Oracle Data Integration, which provides an in-depth look into how Oracle addresses the new trends in the DI market. Following Brad’s session, Nick Wagner provided in depth review of Oracle GoldenGate’s latest features and roadmap. Nick discussed how Oracle GoldenGate’s tight integration with Oracle Database sets the product apart from the competition. We also heard that heterogeneity of the product is still a major focus for GoldenGate’s development and there will be more news on that front when there is a major release. Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Times New Roman","serif"; mso-fareast-font-family:"MS Mincho";} After GoldenGate’s product strategy session, Denis Gray from the PM team presented Oracle Data Integrator’s product strategy session, talking about the latest and greatest on ODI. Another good session was delivered by long-time GoldenGate users, Comcast.  Jason Hurd and Amit Patel of Comcast talked about the various use cases they deploy Oracle GoldenGate throughout their enterprise, from database upgrades, feeding reporting systems, to active-active database synchronization.  The Comcast team shared many good tips on how to use GoldenGate for both zero downtime upgrades and active-active replication with conflict management requirement. One of our other important goals we had this year for the Data Integration track at OpenWorld was hearing from our customers. We ended day 1 on just that, with a wonderful award ceremony for Oracle Excellence Awards for Oracle Fusion Middleware Innovation. The ceremony was held in the Yerba Buena Center for the Arts. Congratulations to Royal Bank of Scotland and Yalumba Wine Company, the winners in the Data Integration category. You can find more information on the award and the winners in our previous blog post: 2013 Oracle Excellence Awards for Fusion Middleware Innovation… Selected for their innovation use of Oracle’s Data Integration products; the winners for the Data Integration Category are Royal Bank of Scotland and The Yalumba Wine Company. Congratulations!!! Royal Bank of Scotland’s Market and International Banking division provides clients across the globe with seamless trading and competitive pricing, underpinned by a deep knowledge of risk management across the full spectrum of financial products. They handle millions of transactions daily to keep the lifeblood of their clients’ businesses flowing – whether through payment management solutions or through bespoke trade finance solutions. Royal Bank of Scotland is leveraging Oracle GoldenGate and Oracle Data Integrator along with Oracle Business Intelligence Enterprise Edition and the Oracle Database for a variety of solutions. Mainly, Oracle GoldenGate and Oracle Data Integrator are used to feed their data warehouse – providing a real-time data integration solution that feeds transactional data to their analytics system in minutes to enable improved decision making with timely, accurate data for their business users. Oracle Data Integrator’s in-database transformation capabilities and its ability to integrate with Oracle GoldenGate for real-time data capture is the foundation of this implementation. This solution makes it such that changes happening in the analytics systems are available the same day they are deployed on the operational system with 100% data quality guaranteed. Additionally, the solution has helped to reduce their operational database size from 150GB to 10GB. Impressive! Now what if I told you this solution was built in 3 months and had a less than 6 month return on investment? That’s outstanding! The Yalumba Wine Company is situated in the Barossa Valley of Australia. It is the oldest family owned winery in Australia with a unique way of aging their wines in specially crafted 100 liter barrels. Did you know that “Yalumba” is Aboriginal for “all the land around”? The Yalumba Wine Company is growing rapidly, and was in need of introducing a more modern standard to the existing manufacturing processes to meet globalization demands, overall time-to-market, and better operational efficiency objectives of product development. The Yalumba Wine Company worked with a partner, Bristlecone to develop a unique solution whereby Oracle Data Integrator is leveraged to pull data from Salesforce.com and JD Edwards, in addition to their other pre-existing source systems, for consumption into their data warehouse. They have emphasized the overall ease of developing integration workflows with Oracle Data Integrator. The solution has brought better visibility for the business users, shorter data loading and transformation performance to their data warehouse with rapid incorporation of new data sources, and a solid future-proof foundation for their organization. Moving forward, they plan on leveraging more from Oracle’s Data Integration portfolio. Terrific! In addition to these two customers on Tuesday we featured many other important Oracle Data Integrator and Oracle GoldenGate customers. On Tuesday the GoldenGate panel included: Land O’Lakes, Smuckers, and Veolia Water. Besides giving us yummy nutrition and healthy water, these companies have another aspect in common. They all use GoldenGate to boost their ERP application. Please read the recap by Irem Radzik. On Wednesday, the ODI Panel included: Barry Ralston and Ryan Weber of Infinity Insurance, Paul Stracke of Paychex Inc., and Ian Wall of Vertex Pharmaceuticals for a session filled with interesting projects, use cases and approaches to leveraging Oracle Data Integrator. Please read the recap by Sandrine Riley for more. Thanks to everyone who joined with us and we hope to stay connected! To hear more about our Data Integration12c products join us in an upcoming webcast to learn more. Follow us www.twitter.com/ORCLGoldenGate or goto our website at www.oracle.com/goto/dataintegration

    Read the article

  • Fast Data - Big Data's achilles heel

    - by thegreeneman
    At OOW 2013 in Mark Hurd and Thomas Kurian's keynote, they discussed Oracle's Fast Data software solution stack and discussed a number of customers deploying Oracle's Big Data / Fast Data solutions and in particular Oracle's NoSQL Database.  Since that time, there have been a large number of request seeking clarification on how the Fast Data software stack works together to deliver on the promise of real-time Big Data solutions.   Fast Data is a software solution stack that deals with one aspect of Big Data, high velocity.   The software in the Fast Data solution stack involves 3 key pieces and their integration:  Oracle Event Processing, Oracle Coherence, Oracle NoSQL Database.   All three of these technologies address a high throughput, low latency data management requirement.   Oracle Event Processing enables continuous query to filter the Big Data fire hose, enable intelligent chained events to real-time service invocation and augments the data stream to provide Big Data enrichment. Extended SQL syntax allows the definition of sliding windows of time to allow SQL statements to look for triggers on events like breach of weighted moving average on a real-time data stream.    Oracle Coherence is a distributed, grid caching solution which is used to provide very low latency access to cached data when the data is too big to fit into a single process, so it is spread around in a grid architecture to provide memory latency speed access.  It also has some special capabilities to deploy remote behavioral execution for "near data" processing.   The Oracle NoSQL Database is designed to ingest simple key-value data at a controlled throughput rate while providing data redundancy in a cluster to facilitate highly concurrent low latency reads.  For example, when large sensor networks are generating data that need to be captured while analysts are simultaneously extracting the data using range based queries for upstream analytics.  Another example might be storing cookies from user web sessions for ultra low latency user profile management, also leveraging that data using holistic MapReduce operations with your Hadoop cluster to do segmented site analysis.  Understand how NoSQL plays a critical role in Big Data capture and enrichment while simultaneously providing a low latency and scalable data management infrastructure thru clustered, always on, parallel processing in a shared nothing architecture. Learn how easily a NoSQL cluster can be deployed to provide essential services in industry specific Fast Data solutions. See these technologies work together in a demonstration highlighting the salient features of these Fast Data enabling technologies in a location based personalization service. The question then becomes how do these things work together to deliver an end to end Fast Data solution.  The answer is that while different applications will exhibit unique requirements that may drive the need for one or the other of these technologies, often when it comes to Big Data you may need to use them together.   You may have the need for the memory latencies of the Coherence cache, but just have too much data to cache, so you use a combination of Coherence and Oracle NoSQL to handle extreme speed cache overflow and retrieval.   Here is a great reference to how these two technologies are integrated and work together.  Coherence & Oracle NoSQL Database.   On the stream processing side, it is similar as with the Coherence case.  As your sliding windows get larger, holding all the data in the stream can become difficult and out of band data may need to be offloaded into persistent storage.  OEP needs an extreme speed database like Oracle NoSQL Database to help it continue to perform for the real time loop while dealing with persistent spill in the data stream.  Here is a great resource to learn more about how OEP and Oracle NoSQL Database are integrated and work together.  OEP & Oracle NoSQL Database.

    Read the article

  • Oracle Announces Oracle Big Data Appliance X3-2 and Enhanced Oracle Big Data Connectors

    - by jgelhaus
    Enables Customers to Easily Harness the Business Value of Big Data at Lower Cost Engineered System Simplifies Big Data for the Enterprise Oracle Big Data Appliance X3-2 hardware features the latest 8-core Intel® Xeon E5-2600 series of processors, and compared with previous generation, the 18 compute and storage servers with 648 TB raw storage now offer: 33 percent more processing power with 288 CPU cores; 33 percent more memory per node with 1.1 TB of main memory; and up to a 30 percent reduction in power and cooling Oracle Big Data Appliance X3-2 further simplifies implementation and management of big data by integrating all the hardware and software required to acquire, organize and analyze big data. It includes: Support for CDH4.1 including software upgrades developed collaboratively with Cloudera to simplify NameNode High Availability in Hadoop, eliminating the single point of failure in a Hadoop cluster; Oracle NoSQL Database Community Edition 2.0, the latest version that brings better Hadoop integration, elastic scaling and new APIs, including JSON and C support; The Oracle Enterprise Manager plug-in for Big Data Appliance that complements Cloudera Manager to enable users to more easily manage a Hadoop cluster; Updated distributions of Oracle Linux and Oracle Java Development Kit; An updated distribution of open source R, optimized to work with high performance multi-threaded math libraries Read More   Data sheet: Oracle Big Data Appliance X3-2 Oracle Big Data Appliance: Datacenter Network Integration Big Data and Natural Language: Extracting Insight From Text Thomson Reuters Discusses Oracle's Big Data Platform Connectors Integrate Hadoop with Oracle Big Data Ecosystem Oracle Big Data Connectors is a suite of software built by Oracle to integrate Apache Hadoop with Oracle Database, Oracle Data Integrator, and Oracle R Distribution. Enhancements to Oracle Big Data Connectors extend these data integration capabilities. With updates to every connector, this release includes: Oracle SQL Connector for Hadoop Distributed File System, for high performance SQL queries on Hadoop data from Oracle Database, enhanced with increased automation and querying of Hive tables and now supported within the Oracle Data Integrator Application Adapter for Hadoop; Transparent access to the Hive Query language from R and introduction of new analytic techniques executing natively in Hadoop, enabling R developers to be more productive by increasing access to Hadoop in the R environment. Read More Data sheet: Oracle Big Data Connectors High Performance Connectors for Load and Access of Data from Hadoop to Oracle Database

    Read the article

  • NEW 2-Day Instructor Led Course on Oracle Data Mining Now Available!

    - by chberger
    A NEW 2-Day Instructor Led Course on Oracle Data Mining has been developed for customers and anyone wanting to learn more about data mining, predictive analytics and knowledge discovery inside the Oracle Database.  Course Objectives: Explain basic data mining concepts and describe the benefits of predictive analysis Understand primary data mining tasks, and describe the key steps of a data mining process Use the Oracle Data Miner to build,evaluate, and apply multiple data mining models Use Oracle Data Mining's predictions and insights to address many kinds of business problems, including: Predict individual behavior, Predict values, Find co-occurring events Learn how to deploy data mining results for real-time access by end-users Five reasons why you should attend this 2 day Oracle Data Mining Oracle University course. With Oracle Data Mining, a component of the Oracle Advanced Analytics Option, you will learn to gain insight and foresight to: Go beyond simple BI and dashboards about the past. This course will teach you about "data mining" and "predictive analytics", analytical techniques that can provide huge competitive advantage Take advantage of your data and investment in Oracle technology Leverage all the data in your data warehouse, customer data, service data, sales data, customer comments and other unstructured data, point of sale (POS) data, to build and deploy predictive models throughout the enterprise. Learn how to explore and understand your data and find patterns and relationships that were previously hidden Focus on solving strategic challenges to the business, for example, targeting "best customers" with the right offer, identifying product bundles, detecting anomalies and potential fraud, finding natural customer segments and gaining customer insight.

    Read the article

  • Best approach to accessing multiple data source in a web application

    - by ced
    I've a base web application developed with .net technologies (asp.net) used into our LAN by 30 users simultanousley. From this web application I've developed two verticalization used from online users. In future i expect hundreds users simultanousley. Our company has different locations. Each site use its own database. The web application needs to retrieve information from all existing databases. Currently there are 3 database, but it's not excluded in the future expansion of new offices. My question then is: What is the best strategy for a web application to retrieve information from different databases (which have the same schema) whereas the main objective performance data access and high fault tolerance? There are case studies in the literature that I can take as an example? Do you know some good documents to study? Do you have any tips to implement this task so efficient? Intuitively I would say that two possible strategy are: perform queries from different sources in real time and aggregate data on the fly; create a repository that contains the union of the entities of interest and perform queries directly on repository;

    Read the article

  • More details on America's Cup use of Oracle Data Mining

    - by charlie.berger
    BMW Oracle Racing's America's Cup: A Victory for Database Technology BMW Oracle Racing's victory in the 33rd America's Cup yacht race in February showcased the crew's extraordinary sailing expertise. But to hear them talk, the real stars weren't actually human. "The story of this race is in the technology," says Ian Burns, design coordinator for BMW Oracle Racing. Gathering and Mining Sailing DataFrom the drag-resistant hull to its 23-story wing sail, the BMW Oracle USA trimaran is a technological marvel. But to learn to sail it well, the crew needed to review enormous amounts of reliable data every time they took the boat for a test run. Burns and his team collected performance data from 250 sensors throughout the trimaran at the rate of 10 times per second. An hour of sailing alone generates 90 million data points.BMW Oracle Racing turned to Oracle Data Mining in Oracle Database 11g to extract maximum value from the data. Burns and his team reviewed and shared raw data with crew members daily using a Web application built in Oracle Application Express (Oracle APEX). "Someone would say, 'Wouldn't it be great if we could look at some new combination of numbers?' We could quickly build an Oracle Application Express application and share the information during the same meeting," says Burns. Analyzing Wind and Other Environmental ConditionsBurns then streamed the data to the Oracle Austin Data Center, where a dedicated team tackled deeper analysis. Because the data was collected in an Oracle Database, the Data Center team could dive straight into the analytics problems without having to do any extract, transform, and load processes or data conversion. And the many advanced data mining algorithms in Oracle Data Mining allowed the analytics team to build vital performance analytics. For example, the technology team could remove masking elements such as environmental conditions to give accurate data on the best mast rotation for certain wind conditions. Without the data mining, Burns says the boat wouldn't have run as fast. "The design of the boat was important, but once you've got it designed, the whole race is down to how the guys can use it," he says. "With Oracle database technology we could compare the incremental improvements in our performance from the first day of sailing to the very last day. With data mining we could check data against the things we saw, and we could find things that weren't otherwise easily observable and findable."

    Read the article

  • Data Mining - Predictive Analysis

    - by IMHO
    We are looking at acquiring Data Mining software to primarily run predictive analysis processes. How does SQL Server Data Mining solution compares to other solutions like SPSS from IBM? Since SQL Server DM is included in SQL Server Enterprise license - what would be the justification to spend extra couple 100K to buy separate software just to do DM?

    Read the article

  • Data Mining project ideas?

    - by Andriyev
    Hi I am looking for project ideas in the field of data mining. I expect to complete it in a quarter and intend to use C++, Linux as the environment. The course I'm taking aims to build the basics of data mining and covers topics like Classification, Regression-Modeling, Clustering and Association learning. Please point me to some good ideas which I can chew on. cheers

    Read the article

  • Great data mining quotes

    - by Andrei Savu
    I'm searching some data mining related quotes. Can you tell me some of the quotes you like? On the internet I have only found this site: http://www.quotesea.com/quotes/with/data%20mining Thanks.

    Read the article

  • SQL SERVER – Introduction to Big Data – Guest Post

    - by pinaldave
    BIG Data – such a big word – everybody talks about this now a days. It is the word in the database world. In one of the conversation I asked my friend Jasjeet Sigh the same question – what is Big Data? He instantly came up with a very effective write-up.  Jasjeet is working as a Technical Manager with Koenig Solutions. He leads the SQL domain, and holds rich IT industry experience. Talking about Koenig, it is a 19 year old IT training company that offers several certification choices. Some of its courses include SharePoint Training, Project Management certifications, Microsoft Trainings, Business Intelligence programs, Web Design and Development courses etc. Big Data, as the name suggests, is about data that is BIG in nature. The data is BIG in terms of size, and it is difficult to manage such enormous data with relational database management systems that are quite popular these days. Big Data is not just about being large in size, it is also about the variety of the data that differs in form or type. Some examples of Big Data are given below : Scientific data related to weather and atmosphere, Genetics etc Data collected by various medical procedures, such as Radiology, CT scan, MRI etc Data related to Global Positioning System Pictures and Videos Radio Frequency Data Data that may vary very rapidly like stock exchange information Apart from difficulties in managing and storing such data, it is difficult to query, analyze and visualize it. The characteristics of Big Data can be defined by four Vs: Volume: It simply means a large volume of data that may span Petabyte, Exabyte and so on. However it also depends organization to organization that what volume of data they consider as Big Data. Variety: As discussed above, Big Data is not limited to relational information or structured Data. It can also include unstructured data like pictures, videos, text, audio etc. Velocity:  Velocity means the speed by which data changes. The higher is the velocity, the more efficient should be the system to capture and analyze the data. Missing any important point may lead to wrong analysis or may even result in loss. Veracity: It has been recently added as the fourth V, and generally means truthfulness or adherence to the truth. In terms of Big Data, it is more of a challenge than a characteristic. It is difficult to ascertain the truth out of the enormous amount of data and the one that has high velocity. There are always chances of having un-precise and uncertain data. It is a challenging task to clean such data before it is analyzed. Big Data can be considered as the next big thing in the IT sector in terms of innovation and development. If appropriate technologies are developed to analyze and use the information, it can be the driving force for almost all industrial segments. These include Retail, Manufacturing, Service, Finance, Healthcare etc. This will help them to automate business decisions, increase productivity, and innovate and develop new products. Thanks Jasjeet Singh for an excellent write up.  Jasjeet Sign is working as a Technical Manager with Koenig Solutions. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Database, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Big Data

    Read the article

  • Creating a Corporate Data Hub

    - by BuckWoody
    The Windows Azure Marketplace has a rich assortment of data and software offerings for you to use – a type of Software as a Service (SaaS) for IT workers, not necessarily for end-users. Among those offerings is the “Data Hub” – a  codename for a project that ironically actually does what the codename says. In many of our organizations, we have multiple data quality issues. Finding data is one problem, but finding it just once is often a bigger problem. Lots of departments and even individuals have stored the same data more than once, and in some cases, made changes to one of the copies. It’s difficult to know which location or version of the data is authoritative. Then there’s the problem of accessing the data. It’s fairly straightforward to publish a database, share or other location internally to store the data. But then you have to figure out who owns it, how it is controlled, and pass out the various connection strings to those who want to use it. And then you need to figure out how to let folks access the internal data externally – bringing up all kinds of security issues. Finally, in many cases our user community wants us to combine data from the internally sources with external data, bringing up the security, strings, and exploration features up all over again. Enter the Data Hub. This is an online offering, where you assign an administrator and data stewards. You import the data into the service, and it’s available to you - and only you and your organization if you wish. The basic steps for this service are to set up the portal for your company, assign administrators and permissions, and then you assign data areas and import data into them. From there you make them discoverable, and then you have multiple options that you or your users can access that data. You’re then able, if you wish, to combine that data with other data in one location. So how does all that work? What about security? Is it really that easy? And can you really move the data definition off to the Subject Matter Experts (SME’s) that know the particular data stack better than the IT team does? Well, nothing good is easy – but using the Data Hub is actually pretty simple. I’ll give you a link in a moment where you can sign up and try this yourself. Once you sign up, you assign an administrator. From there you’ll create data areas, and then use a simple interface to bring the data in. All of this is done in a portal interface – nothing to install, configure, update or manage. After the data is entered in, and you’ve assigned meta-data to describe it, your users have multiple options to access it. They can simply use the portal – which actually has powerful visualizations you can use on any platform, even mobile phones or tablets.     Your users can also hit the data with Excel – which gives them ultimate flexibility for display, all while using an authoritative, single reference for the data. Since the service is online, they can do this wherever they are – given the proper authentication and permissions. You can also hit the service with simple API calls, like this one from C#: http://msdn.microsoft.com/en-us/library/hh921924  You can make HTTP calls instead of code, and the data can even be exposed as an OData Feed. As you can see, there are a lot of options. You can check out the offering here: http://www.microsoft.com/en-us/sqlazurelabs/labs/data-hub.aspx and you can read the documentation here: http://msdn.microsoft.com/en-us/library/hh921938

    Read the article

  • Creating a Corporate Data Hub

    - by BuckWoody
    The Windows Azure Marketplace has a rich assortment of data and software offerings for you to use – a type of Software as a Service (SaaS) for IT workers, not necessarily for end-users. Among those offerings is the “Data Hub” – a  codename for a project that ironically actually does what the codename says. In many of our organizations, we have multiple data quality issues. Finding data is one problem, but finding it just once is often a bigger problem. Lots of departments and even individuals have stored the same data more than once, and in some cases, made changes to one of the copies. It’s difficult to know which location or version of the data is authoritative. Then there’s the problem of accessing the data. It’s fairly straightforward to publish a database, share or other location internally to store the data. But then you have to figure out who owns it, how it is controlled, and pass out the various connection strings to those who want to use it. And then you need to figure out how to let folks access the internal data externally – bringing up all kinds of security issues. Finally, in many cases our user community wants us to combine data from the internally sources with external data, bringing up the security, strings, and exploration features up all over again. Enter the Data Hub. This is an online offering, where you assign an administrator and data stewards. You import the data into the service, and it’s available to you - and only you and your organization if you wish. The basic steps for this service are to set up the portal for your company, assign administrators and permissions, and then you assign data areas and import data into them. From there you make them discoverable, and then you have multiple options that you or your users can access that data. You’re then able, if you wish, to combine that data with other data in one location. So how does all that work? What about security? Is it really that easy? And can you really move the data definition off to the Subject Matter Experts (SME’s) that know the particular data stack better than the IT team does? Well, nothing good is easy – but using the Data Hub is actually pretty simple. I’ll give you a link in a moment where you can sign up and try this yourself. Once you sign up, you assign an administrator. From there you’ll create data areas, and then use a simple interface to bring the data in. All of this is done in a portal interface – nothing to install, configure, update or manage. After the data is entered in, and you’ve assigned meta-data to describe it, your users have multiple options to access it. They can simply use the portal – which actually has powerful visualizations you can use on any platform, even mobile phones or tablets.     Your users can also hit the data with Excel – which gives them ultimate flexibility for display, all while using an authoritative, single reference for the data. Since the service is online, they can do this wherever they are – given the proper authentication and permissions. You can also hit the service with simple API calls, like this one from C#: http://msdn.microsoft.com/en-us/library/hh921924  You can make HTTP calls instead of code, and the data can even be exposed as an OData Feed. As you can see, there are a lot of options. You can check out the offering here: http://www.microsoft.com/en-us/sqlazurelabs/labs/data-hub.aspx and you can read the documentation here: http://msdn.microsoft.com/en-us/library/hh921938

    Read the article

  • Ideal data structure/techniques for storing generic scheduler data in C#

    - by GraemeMiller
    I am trying to implement a generic scheduler object in C# 4 which will output a table in HTML. Basic aim is to show some object along with various attributes, and whether it was doing something in a given time period. The scheduler will output a table displaying the headers: Detail Field 1 ....N| Date1.........N I want to initialise the table with a start date and an end date to create the date range (ideally could also do other time periods e.g. hours but that isn't vital). I then want to provide a generic object which will have associated events. Where an object has events within the period I want a table cell to be marked E.g. Name Height Weight 1/1/2011 2/1/2011 3/1/20011...... 31/1/2011 Ben 5.11 75 X X X Bill 5.7 83 X X So I created scheduler with Start Date=1/1/2011 and end date 31/1/2011 I'd like to give it my person object (already sorted) and tell it which fields I want displayed (Name, Height, Weight) Each person has events which have a start date and end date. Some events will start and end outwith but they should still be shown on the relevant date etc. Ideally I'd like to have been able to provide it with say a class booking object as well. So I'm trying to keep it generic. I have seen Javasript implementations etc of similar. What would a good data structure be for this? Any thoughts on techniques I could use to make it generic. I am not great with generics so any tips appreciated.

    Read the article

  • open source gossip-based membership protocol?

    - by Aaron
    I am looking for a library which I can plug into a distributed application which implements any gossip-based membership protocol. Such a library would allow me to send/receive membership lists, merge received membership lists, etc... Even better would be if the library implemented a protocol with performance O(logn) performance guarantees. Does anyone know of any open source library like this? It doesn't need to meet all of the aforementioned requirements; even something partially implemented would be helpful.

    Read the article

  • Python, web log data mining for frequent patterns

    - by descent
    Hello! I need to develop a tool for web log data mining. Having many sequences of urls, requested in a particular user session (retrieved from web-application logs), I need to figure out the patterns of usage and groups (clusters) of users of the website. I am new to Data Mining, and now examining Google a lot. Found some useful info, i.e. querying Frequent Pattern Mining in Web Log Data seems to point to almost exactly similar studies. So my questions are: Are there any python-based tools that do what I need or at least smth similar? Can Orange toolkit be of any help? Can reading the book Programming Collective Intelligence be of any help? What to Google for, what to read, which relatively simple algorithms to use best? I am very limited in time (to around a week), so any help would be extremely precious. What I need is to point me into the right direction and the advice of how to accomplish the task in the shortest time. Thanks in advance!

    Read the article

  • Accessing SQL Data Services via ADO.NET Data Service Client Library

    - by Mehmet Aras
    Is this possible? Basically I would like to use SQL Data Services REST interface and let the ADO.NET Data Service Client library handle communication details and generate the entities that I can use. I looked at the samples in February release of Azure services kit but the samples in there are using HttpWebRequest and HttpWebResponse to consume SQL Data Services RESTfully. I was hoping to use ADO.NET Data Service Client library to abstract low-level details away.

    Read the article

  • Online network mining software

    - by ron
    A year ago I stumbled upon a website which provided an online application for building a network online. For example, I entered some urls and phrases, and it automatically searched them for news, inserted the connections between them, etc. I can't find it now. Do you know such software?

    Read the article

  • Information Extraction Toolkits

    - by MathGladiator
    I'm looking for information extraction libraries where I can have semi structured information that may have either hidden or incomplete data. I want to train some classifiers to pull out content based on the structure. I'm working on building a tool where I can select text in the browser, and it will generate (via some web service call) a classifier that can be used on other documents to pull out text. I'm primarily looking at how the structure of the document can be used to indicate what the content is.

    Read the article

  • Bridging Two Worlds: Big Data and Enterprise Data

    - by Dain C. Hansen
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} The big data world is all the vogue in today’s IT conversations. It’s a world of volume, velocity, variety – tantalizing us with its untapped potential. It’s a world of transformational game-changing technologies that have already begun to alter the information management landscape. One of the reasons that big data is so compelling is that it’s a universal challenge that impacts every one of us. Whether it is healthcare, financial, manufacturing, government, retail - big data presents a pressing problem for many industries: how can so much information be processed so quickly to deliver the ‘bigger’ picture? With big data we’re tapping into new information that didn’t exist before: social data, weblogs, sensor data, complex content, and more. What also makes big data revolutionary is that it turns traditional information architecture on its head, putting into question commonly accepted notions of where and how data should be aggregated processed, analyzed, and stored. This is where Hadoop and NoSQL come in – new technologies which solve new problems for managing unstructured data. And now for some worst practices that I'd recommend that you please not follow: Worst Practice Lesson 1: Throw away everything that you already know about data management, data integration tools, and start completely over. One shouldn’t forget what’s already running in today’s IT. Today’s Business Analytics, Data Warehouses, Business Applications (ERP, CRM, SCM, HCM), and even many social, mobile, cloud applications still rely almost exclusively on structured data – or what we’d like to call enterprise data. This dilemma is what today’s IT leaders are up against: what are the best ways to bridge enterprise data with big data? And what are the best strategies for dealing with the complexities of these two unique worlds? Worst Practice Lesson 2: Throw away all of your existing business applications … because they don’t run on big data yet. Bridging the two worlds of big data and enterprise data means considering solutions that are complete, based on emerging Hadoop technologies (as well as traditional), and are poised for success through integrated design tools, integrated platforms that connect to your existing business applications, as well as and support real-time analytics. Leveraging these types of best practices translates to improved productivity, lowered TCO, IT optimization, and better business insights. Worst Practice Lesson 3: Separate out [and keep separate] your big data sandboxes from all the current enterprise IT systems. Don’t mix sand among playgrounds. We didn't tell you that you wouldn't get dirty doing this. Correlation between the two worlds is key. The real advantage to analyzing big data comes when you can correlate it with the existing data in your data warehouse or your current applications to make sense of the larger patterns. If you have not followed these worst practices 1-3 then you qualify for the first step of our journey: bridging the two worlds of enterprise data and big data. Over the next several weeks we’ll be discussing this topic along with several others around big data as it relates to data integration. We welcome you to join us in the conversation by following us on twitter on #BridgingBigData or download our latest white paper and resource kit: Big Data and Enterprise Data: Bridging Two Worlds.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >