Search Results

Search found 59554 results on 2383 pages for 'distributed data mining'.

Page 3/2383 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Can an issue tracking system be distributed?

    - by Klaim
    I was thinking about issue tracking software like Redmine, Trac or even the one that is in Fossil and something hit me: Is there a reason why Redmine and Trac are not possible to be distributed? Or maybe it's possible and I just don't know how it's possible? If it's not possible, why? By distributed I mean like Facebook or Google or other applications that effectively runs on multiple hardware a the same time but share data.

    Read the article

  • timetable in a jTable

    - by chandra
    I want to create a timetable in a jTable. For the top row it will display from monday to sunday and the left colume will display the time of the day with 2h interval e.g 1st colume (0000 - 0200), 2nd colume (0200 - 0400) .... And if i click a button the timing will change from 2h interval to 1h interval. I do not want to hardcode it because i need to do for 2h, 1h, 30min , 15min, 1min, 30sec and 1 sec interval and it will take too long for me to hardcode. Can anyone show me an example or help me create an example for the 2h to 1h interval so that i know what to do? The data array is for me to store data and are there any other easier or shortcuts for me to store them because if it is in 1 sec interval i got thousands of array i need to type it out. private void oneHour() //1 interval functions { if(!once) { initialize(); once = true; } jTable.setModel(new javax.swing.table.DefaultTableModel( new Object [][] { {"0000 - 0100", data[0][0], data[0][1], data[0][2], data[0][3], data[0][4], data[0][5], data[0][6]}, {"0100 - 0200", data[2][0], data[2][1], data[2][2], data[2][3], data[2][4], data[2][5], data[2][6]}, {"0200 - 0300", data[4][0], data[4][1], data[4][2], data[4][3], data[4][4], data[4][5], data[4][6]}, {"0300 - 0400", data[6][0], data[6][1], data[6][2], data[6][3], data[6][4], data[6][5], data[6][6]}, {"0400 - 0600", data[8][0], data[8][1], data[8][2], data[8][3], data[8][4], data[8][5], data[8][6]}, {"0600 - 0700", data[10][0], data[4][1], data[10][2], data[10][3], data[10][4], data[10][5], data[10][6]}, {"0700 - 0800", data[12][0], data[12][1], data[12][2], data[12][3], data[12][4], data[12][5], data[12][6]}, {"0800 - 0900", data[14][0], data[14][1], data[14][2], data[14][3], data[14][4], data[14][5], data[14][6]}, {"0900 - 1000", data[16][0], data[16][1], data[16][2], data[16][3], data[16][4], data[16][5], data[16][6]}, {"1000 - 1100", data[18][0], data[18][1], data[18][2], data[18][3], data[18][4], data[18][5], data[18][6]}, {"1100 - 1200", data[20][0], data[20][1], data[20][2], data[20][3], data[20][4], data[20][5], data[20][6]}, {"1200 - 1300", data[22][0], data[22][1], data[22][2], data[22][3], data[22][4], data[22][5], data[22][6]}, {"1300 - 1400", data[24][0], data[24][1], data[24][2], data[24][3], data[24][4], data[24][5], data[24][6]}, {"1400 - 1500", data[26][0], data[26][1], data[26][2], data[26][3], data[26][4], data[26][5], data[26][6]}, {"1500 - 1600", data[28][0], data[28][1], data[28][2], data[28][3], data[28][4], data[28][5], data[28][6]}, {"1600 - 1700", data[30][0], data[30][1], data[30][2], data[30][3], data[30][4], data[30][5], data[30][6]}, {"1700 - 1800", data[32][0], data[32][1], data[32][2], data[32][3], data[32][4], data[32][5], data[32][6]}, {"1800 - 1900", data[34][0], data[34][1], data[34][2], data[34][3], data[34][4], data[34][5], data[34][6]}, {"1900 - 2000", data[36][0], data[36][1], data[36][2], data[36][3], data[36][4], data[36][5], data[36][6]}, {"2000 - 2100", data[38][0], data[38][1], data[38][2], data[38][3], data[38][4], data[38][5], data[38][6]}, {"2100 - 2200", data[40][0], data[40][1], data[40][2], data[40][3], data[40][4], data[40][5], data[40][6]}, {"2200 - 2300", data[42][0], data[42][1], data[42][2], data[42][3], data[42][4], data[42][5], data[42][6]}, {"2300 - 2400", data[44][0], data[44][1], data[44][2], data[44][3], data[44][4], data[44][5], data[44][6]}, {"2400 - 0000", data[46][0], data[46][1], data[46][2], data[46][3], data[46][4], data[46][5], data[46][6]}, }, new String [] { "Time/Day", "(Mon)", "(Tue)", "(Wed)", "(Thurs)", "(Fri)", "(Sat)", "(Sun)" } )); } private void twoHour() //2 hour interval functions { if(!once) { initialize(); once = true; } jTable.setModel(new javax.swing.table.DefaultTableModel( new Object [][] { {"0000 - 0200", data[0][0], data[0][1], data[0][2], data[0][3], data[0][4], data[0][5], data[0][6]}, {"0200 - 0400", data[4][0], data[4][1], data[4][2], data[4][3], data[4][4], data[4][5], data[4][6]}, {"0400 - 0600", data[8][0], data[8][1], data[8][2], data[8][3], data[8][4], data[8][5], data[8][6]}, {"0600 - 0800", data[12][0], data[12][1], data[12][2], data[12][3], data[12][4], data[12][5], data[12][6]}, {"0800 - 1000", data[16][0], data[16][1], data[16][2], data[16][3], data[16][4], data[16][5], data[16][6]}, {"1000 - 1200", data[20][0], data[20][1], data[20][2], data[20][3], data[20][4], data[20][5], data[20][6]}, {"1200 - 1400", data[24][0], data[24][1], data[24][2], data[24][3], data[24][4], data[24][5], data[24][6]}, {"1400 - 1600", data[28][0], data[28][1], data[28][2], data[28][3], data[28][4], data[28][5], data[28][6]}, {"1600 - 1800", data[32][0], data[32][1], data[32][2], data[32][3], data[32][4], data[32][5], data[32][6]}, {"1800 - 2000", data[36][0], data[36][1], data[36][2], data[36][3], data[36][4], data[36][5], data[36][6]}, {"2000 - 2200", data[40][0], data[40][1], data[40][2], data[40][3], data[40][4], data[40][5], data[40][6]}, {"2200 - 2400",data[44][0], data[44][1], data[44][2], data[44][3], data[44][4], data[44][5], data[44][6]} },

    Read the article

  • Evaluating Oracle Data Mining Has Never Been Easier - Evaluation "Kit" Available

    - by chberger
    Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Now you can quickly and easily get set up to starting using Oracle Data Mining for evaluation purposes. Just go to the Oracle Technology Network (OTN) and follow these simple steps. Oracle Data Mining Evaluation "Kit" Instructions Step 1: Download and Install the Oracle Database 11g Release 2 Anyone can download and install the Oracle Database for free for evaluation purposes. Read OTN web site for details. 11.2.0.1.0 DB is the minimum, 11.2.0.2 is better and naturally 11.2.0.3 is best if you are a current customer and on active support. Either 32-bit or 64-bit is fine. 4GB of RAM or more works fine for SQL Developer and the Oracle Data Miner GUI extension. Downloading the database and installing it should take just about an hour or so, depending on your network and computer. For more instructions on setting up Oracle Data Mining see: http://www.oracle.com/technetwork/database/options/odm/dataminerworkflow-168677.html When you install the Oracle Database, the Sample Examples data should also be installed e.g.:Release 2 Examples win32_11gR2_examples.zip (565,154,740 bytes). Contains examples of how to use the Oracle Database. Download if you are new to Oracle and want to try some of the examples presented in the Documentation Step 2: Install SQL Developer 3.1 (the Oracle Data Mining Extension installs automatically) Step 3. Follow the four free step-by-step Oracle-by-Examples e-training lessons: Setting Up Oracle Data Miner 11g Release 2 This tutorial covers the process of setting up Oracle Data Miner 11g Release 2 for use within Oracle SQL Developer 3.0. Using Oracle Data Miner 11g Release 2 This tutorial covers the use of Oracle Data Miner to perform data mining against Oracle Database 11g Release 2. In this lesson, you examine and solve a data mining business problem by using the Oracle Data Miner graphical user interface (GUI). Star Schema Mining Using Oracle Data Miner This tutorial covers the use of Oracle Data Miner to perform star schema mining against Oracle Database 11g Release 2. Text Mining Using Oracle Data Miner This tutorial covers the use of Oracle Data Miner to perform text mining against Oracle Database 11g Release 2. That’s it! Easy, fun and the fastest way to get started evaluating Oracle Data Mining. Enjoy! Charlie

    Read the article

  • New R Interface to Oracle Data Mining Available for Download

    - by charlie.berger
      The R Interface to Oracle Data Mining ( R-ODM) allows R users to access the power of Oracle Data Mining's in-database functions using the familiar R syntax. R-ODM provides a powerful environment for prototyping data analysis and data mining methodologies. R-ODM is especially useful for: Quick prototyping of vertical or domain-based applications where the Oracle Database supports the application Scripting of "production" data mining methodologies Customizing graphics of ODM data mining results (examples: classification, regression, anomaly detection) The R-ODM interface allows R users to mine data using Oracle Data Mining from the R programming environment. It consists of a set of function wrappers written in source R language that pass data and parameters from the R environment to the Oracle RDBMS enterprise edition as standard user PL/SQL queries via an ODBC interface. The R-ODM interface code is a thin layer of logic and SQL that calls through an ODBC interface. R-ODM does not use or expose any Oracle product code as it is completely an external interface and not part of any Oracle product. R-ODM is similar to the example scripts (e.g., the PL/SQL demo code) that illustrates the use of Oracle Data Mining, for example, how to create Data Mining models, pass arguments, retrieve results etc. R-ODM is packaged as a standard R source package and is distributed freely as part of the R environment's Comprehensive R Archive Network (CRAN). For information about the R environment, R packages and CRAN, see www.r-project.org. R-ODM is particularly intended for data analysts and statisticians familiar with R but not necessarily familiar with the Oracle database environment or PL/SQL. It is a convenient environment to rapidly experiment and prototype Data Mining models and applications. Data Mining models prototyped in the R environment can easily be deployed in their final form in the database environment, just like any other standard Oracle Data Mining model. What is R? R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. The design of R has been heavily influenced by two existing languages: Becker, Chambers & Wilks' S and Sussman's Scheme. Whereas the resulting language is very similar in appearance to S, the underlying implementation and semantics are derived from Scheme. R was initially written by Ross Ihaka and Robert Gentleman at the Department of Statistics of the University of Auckland in Auckland, New Zealand. Since mid-1997 there has been a core group (the "R Core Team") who can modify the R source code archive. Besides this core group many R users have contributed application code as represented in the near 1,500 publicly-available packages in the CRAN archive (which has shown exponential growth since 2001; R News Volume 8/2, October 2008). Today the R community is a vibrant and growing group of dozens of thousands of users worldwide. It is free software distributed under a GNU-style copyleft, and an official part of the GNU project ("GNU S"). Resources: R website / CRAN R-ODM

    Read the article

  • SQL Rally Pre-Con: Data Warehouse Modeling – Making the Right Choices

    - by Davide Mauri
    As you may have already learned from my old post or Adam’s or Kalen’s posts, there will be two SQL Rally in North Europe. In the Stockholm SQL Rally, with my friend Thomas Kejser, I’ll be delivering a pre-con on Data Warehouse Modeling: Data warehouses play a central role in any BI solution. It's the back end upon which everything in years to come will be created. For this reason, it must be rock solid and yet flexible at the same time. To develop such a data warehouse, you must have a clear idea of its architecture, a thorough understanding of the concepts of Measures and Dimensions, and a proven engineered way to build it so that quality and stability can go hand-in-hand with cost reduction and scalability. In this workshop, Thomas Kejser and Davide Mauri will share all the information they learned since they started working with data warehouses, giving you the guidance and tips you need to start your BI project in the best way possible?avoiding errors, making implementation effective and efficient, paving the way for a winning Agile approach, and helping you define how your team should work so that your BI solution will stand the test of time. You'll learn: Data warehouse architecture and justification Agile methodology Dimensional modeling, including Kimball vs. Inmon, SCD1/SCD2/SCD3, Junk and Degenerate Dimensions, and Huge Dimensions Best practices, naming conventions, and lessons learned Loading the data warehouse, including loading Dimensions, loading Facts (Full Load, Incremental Load, Partitioned Load) Data warehouses and Big Data (Hadoop) Unit testing Tracking historical changes and managing large sizes With all the Self-Service BI hype, Data Warehouse is become more and more central every day, since if everyone will be able to analyze data using self-service tools, it’s better for him/her to rely on correct, uniform and coherent data. Already 50 people registered from the workshop and seats are limited so don’t miss this unique opportunity to attend to this workshop that is really a unique combination of years and years of experience! http://www.sqlpass.org/sqlrally/2013/nordic/Agenda/PreconferenceSeminars.aspx See you there!

    Read the article

  • Oracle Financial Analytics for SAP Certified with Oracle Data Integrator EE

    - by denis.gray
    Two days ago Oracle announced the release of Oracle Financial Analytics for SAP.  With the amount of press this has garnered in the past two days, there's a key detail that can't be missed.  This release is certified with Oracle Data Integrator EE - now making the combination of Data Integration and Business Intelligence a force to contend with.  Within the Oracle Press Release there were two important bullets: ·         Oracle Financial Analytics for SAP includes a pre-packaged ABAP code compliant adapter and is certified with Oracle Data Integrator Enterprise Edition to integrate SAP Financial Accounting data directly with the analytic application.  ·         Helping to integrate SAP financial data and disparate third-party data sources is Oracle Data Integrator Enterprise Edition which delivers fast, efficient loading and transformation of timely data into a data warehouse environment through its high-performance Extract Load and Transform (E-LT) technology. This is very exciting news, demonstrating Oracle's overall commitment to Oracle Data Integrator EE.   This is a great way to start off the new year and we look forward to building on this momentum throughout 2011.   The following links contain additional information and media responses about the Oracle Financial Analytics for SAP release. IDG News Service (Also appeared in PC World, Computer World, CIO: "Oracle is moving further into rival SAP's turf with Oracle Financial Analytics for SAP, a new BI (business intelligence) application that can crunch ERP (enterprise resource planning) system financial data for insights." Information Week: "Oracle talks a good game about the appeal of an optimized, all-Oracle stack. But the company also recognizes that we live in a predominantly heterogeneous IT world" CRN: "While some businesses with SAP Financial Accounting already use Oracle BI, those integrations had to be custom developed. The new offering provides pre-built integration capabilities." ECRM Guide:  "Among other features, Oracle Financial Analytics for SAP helps front-line managers improve financial performance and decision-making with what the company says is comprehensive, timely and role-based information on their departments' expenses and revenue contributions."   SAP Getting Started Guide for ODI on OTN: http://www.oracle.com/technetwork/middleware/data-integrator/learnmore/index.html For more information on the ODI and its SAP connectivity please review the Oracle® Fusion Middleware Application Adapters Guide for Oracle Data Integrator11g Release 1 (11.1.1)

    Read the article

  • Distributed Computing Framework (.NET) - Specifically for CPU Instensive operations

    - by StevenH
    I am currently researching the options that are available (both Open Source and Commercial) for developing a distributed application. "A distributed system consists of multiple autonomous computers that communicate through a computer network." Wikipedia The application is focused on distributing highly cpu intensive operations (as opposed to data intensive) so I'm sure MapReduce solutions don't fit the bill. Any framework that you can recommend ( + give a brief summary of any experience or comparison to other frameworks ) would be greatly appreciated. Thanks.

    Read the article

  • What is the definition of "Big Data"?

    - by Ben
    Is there one? All the definitions I can find describe the size, complexity / variety or velocity of the data. Wikipedia's definition is the only one I've found with an actual number Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. However, this seemingly contradicts the MIKE2.0 definition, referenced in the next paragraph, which indicates that "big" data can be small and that 100,000 sensors on an aircraft creating only 3GB of data could be considered big. IBM despite saying that: Big data is more simply than a matter of size. have emphasised size in their definition. O'Reilly has stressed "volume, velocity and variety" as well. Though explained well, and in more depth, the definition seems to be a re-hash of the others - or vice-versa of course. I think that a Computer Weekly article title sums up a number of articles fairly well "What is big data and how can it be used to gain competitive advantage". But ZDNet wins with the following from 2012: “Big Data” is a catch phrase that has been bubbling up from the high performance computing niche of the IT market... If one sits through the presentations from ten suppliers of technology, fifteen or so different definitions are likely to come forward. Each definition, of course, tends to support the need for that supplier’s products and services. Imagine that. Basically "big data" is "big" in some way shape or form. What is "big"? Is it quantifiable at the current time? If "big" is unquantifiable is there a definition that does not rely solely on generalities?

    Read the article

  • Big Data – Operational Databases Supporting Big Data – Columnar, Graph and Spatial Database – Day 14 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the Key-Value Pair Databases and Document Databases in the Big Data Story. In this article we will understand the role of Columnar, Graph and Spatial Database supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (The day before yesterday’s post) NoSQL Databases (The day before yesterday’s post) Key-Value Pair Databases (Yesterday’s post) Document Databases (Yesterday’s post) Columnar Databases (Tomorrow’s post) Graph Databases (Today’s post) Spatial Databases (Today’s post) Columnar Databases  Relational Database is a row store database or a row oriented database. Columnar databases are column oriented or column store databases. As we discussed earlier in Big Data we have different kinds of data and we need to store different kinds of data in the database. When we have columnar database it is very easy to do so as we can just add a new column to the columnar database. HBase is one of the most popular columnar databases. It uses Hadoop file system and MapReduce for its core data storage. However, remember this is not a good solution for every application. This is particularly good for the database where there is high volume incremental data is gathered and processed. Graph Databases For a highly interconnected data it is suitable to use Graph Database. This database has node relationship structure. Nodes and relationships contain a Key Value Pair where data is stored. The major advantage of this database is that it supports faster navigation among various relationships. For example, Facebook uses a graph database to list and demonstrate various relationships between users. Neo4J is one of the most popular open source graph database. One of the major dis-advantage of the Graph Database is that it is not possible to self-reference (self joins in the RDBMS terms) and there might be real world scenarios where this might be required and graph database does not support it. Spatial Databases  We all use Foursquare, Google+ as well Facebook Check-ins for location aware check-ins. All the location aware applications figure out the position of the phone with the help of Global Positioning System (GPS). Think about it, so many different users at different location in the world and checking-in all together. Additionally, the applications now feature reach and users are demanding more and more information from them, for example like movies, coffee shop or places see. They are all running with the help of Spatial Databases. Spatial data are standardize by the Open Geospatial Consortium known as OGC. Spatial data helps answering many interesting questions like “Distance between two locations, area of interesting places etc.” When we think of it, it is very clear that handing spatial data and returning meaningful result is one big task when there are millions of users moving dynamically from one place to another place & requesting various spatial information. PostGIS/OpenGIS suite is very popular spatial database. It runs as a layer implementation on the RDBMS PostgreSQL. This makes it totally unique as it offers best from both the worlds. Courtesy: mushroom network Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Hive. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Can't save data for a member in a data form

    - by RahulS
    Implied sharing is an old thing everyone knows the reasons and solutions of that, still little theory about that: With Essbase implied sharing, some members are shared even if you do not explicitly set them as shared. These members are implied shared members. When an implied share relationship is created, each implied member assumes the other member’s value. Essbase assumes (or implies) a shared member relationship in these situations: 1. A parent has only one child 2. A parent has only one child that consolidates to the parent In a Planning form that contains members with an implied sharing relationship, when a value is added for the parent, the child assumes the same value after the form is saved. Likewise, if a value is added for the child, the parent usually assumes the same value after a form is saved.For example, when a calculation script or load rule populates an implied share member, the other implied share member assumes the value of the member populated by the calculation script or load rule. The last value calculated or imported takes precedence. The result is the same whether you refer to the parent or the child as a variable in a calculation script. For more information have a look at: http://docs.oracle.com/cd/E17236_01/epm.1112/hp_admin_11122/ch14s11.html Now the issue which we are going to talk about is We loose data on save even when the parent is dynamic calc and has a single child. A dynamic calc parent to a single child:  If we design the form with following selection: In the data form we will find parent below the member and this is by design whenever you make a selection using commands to select all the member below parent, always children will appear before the parent: Lets try to enter data, Save it Now, try to change the way we selected members Here we go: Now the question again why this behavior: 1. Data from Planning data form passes to Essbase row by row, 2. Because in data form the child member appears before the parent, 3. First, data goes to Essbase for child (SingleStoreChild), 4. Then when Planning passes the data for parent there was #Missing or No data,  5. Over writes the data to #missing. PS: As we know that dynamic calc members are calculated on the fly they are not allocated with any memory in the Essbase, here the parent was dynamic calc and it was pointing to same memory as child in the background, when Planning was passing data to Essbase for second row it has updated the child with missing data.(Little confusing, let me know if you need more explanation) 6. As one of the solutions just change the order of appearance of parent and child. Cheers..!!! Rahul S. https://www.facebook.com/pages/HyperionPlanning/117320818374228

    Read the article

  • Distributed Development Tools -- (Version control and Project Management)

    - by Macy Abbey
    Hello, I've recently become responsible for choosing which source control and project management software to use for a company that employs me. Currently it uses Jira (project management) and Subversion (version control). I know there are many other options out there -- the ones I know about are all in this article http://mashable.com/2010/07/14/distributed-developer-teams/ . I'm leaning towards recommending they just stay with what they have as it seems workable and any change would have to be worth the cost of switching to say github/basecamp or some other solution. Some details on the team: It's a distributed development shop. Meetings of the whole team in one room are rare. It's currently a very small development team (three developers). The project management software is used by developers and a product manager or two. What are you experiences with version control and project management web applications? Are there any you would recommend and you think are worth the switching cost of time to learn new services / implementing the change? Edit: After educating myself further on the options it appears DVCS offer powerful benefits that may be worth investing in now as opposed to later in the company's lifetime when the switching cost is higher: I'm a Subversion geek, why I should consider or not consider Mercurial or Git or any other DVCS?

    Read the article

  • Distributed Development Tools -- (Version control and Project Management)

    - by Macy Abbey
    I've recently become responsible for choosing which source control and project management software to use for a company that employs me. Currently it uses Jira (project management) and Subversion (version control). I know there are many other options out there -- the ones I know about are all in this article http://mashable.com/2010/07/14/distributed-developer-teams/ . I'm leaning towards recommending they just stay with what they have as it seems workable and any change would have to be worth the cost of switching to say github/basecamp or some other solution. Some details on the team: It's a distributed development shop. Meetings of the whole team in one room are rare. It's currently a very small development team (three developers). The project management software is used by developers and a product manager or two. What are you experiences with version control and project management web applications? Are there any you would recommend and you think are worth the switching cost of time to learn new services / implementing the change? Edit: After educating myself further on the options it appears DVCS offer powerful benefits that may be worth investing in now as opposed to later in the company's lifetime when the switching cost is higher: I'm a Subversion geek, why I should consider or not consider Mercurial or Git or any other DVCS?

    Read the article

  • Is Data Science “Science”?

    - by BuckWoody
    I hold the term “science” in very high esteem. I grew up on the Space Coast in Florida, and eventually worked at the Kennedy Space Center, surrounded by very intelligent people who worked in various scientific fields. Recently a new term has entered the computing dialog – “Data Scientist”. Since it’s not a standard term, it has a lot of definitions, and in fact has been disputed as a correct term. After all, the reasoning goes, if there’s no such thing as “Data Science” then how can there be a Data Scientist? This argument has been made before, albeit with a different term – “Computer Science”. In Peter Denning’s excellent article “Is Computer Science Science” (April  2005/Vol. 48, No. 4 COMMUNICATIONS OF THE ACM) there are many points that separate “science” from “engineering” and even “art”.  I won’t repeat the content of that article here (I recommend you read it on your own) but will leverage the points he makes there. Definition of Science To ask the question “is data science ‘science’” then we need to start with a definition of terms. Various references put the definition into the same basic areas: Study of the physical world Systematic and/or disciplined study of a subject area ...and then they include the things studied, the bodies of knowledge and so on. The word itself comes from Latin, and means merely “to know” or “to study to know”. Greek divides knowledge further into “truth” (episteme), and practical use or effects (tekhne). Normally computing falls into the second realm. Definition of Data Science And now a more controversial definition: Data Science. This term is so new and perhaps so niche that the major dictionaries haven’t yet picked it up (my OED reference is older – can’t afford to pop for the online registration at present). Researching the term's general use I created an amalgam of the definitions this way: “Studying and applying mathematical and other techniques to derive information from complex data sets.” Using this definition, data science certainly seems to be science - it's learning about and studying some object or area using systematic methods. But implicit within the definition is the word “application”, which makes the process more akin to engineering or even technology than science. In fact, I find that using these techniques – and data itself – part of science, not science itself. I leave out the concept of studying data patterns or algorithms as part of this discipline. That is actually a domain I see within research, mathematics or computer science. That of course is a type of science, but does not seek for practical applications. As part of the argument against calling it “Data Science”, some point to the scientific method of creating a hypothesis, testing with controls, testing results against the hypothesis, and documenting for repeatability.  These are not steps that we often take in working with data. We normally start with a question, and fit patterns and algorithms to predict outcomes and find correlations. In this way Data Science is more akin to statistics (and in fact makes heavy use of them) in the process rather than starting with an assumption and following on with it. So, is Data Science “Science”? I’m uncertain – and I’m uncertain it matters. Even if we are facing rampant “title inflation” these days (does anyone introduce themselves as a secretary or supervisor anymore?) I can tolerate the term at least from the intent that we use data to study problems across a wide spectrum, rather than restricting it to a single domain. And I also understand those who have worked hard to achieve the very honorable title of “scientist” who have issues with those who borrow the term without asking. What do you think? Science, or not? Does it matter?

    Read the article

  • Data Modeling Resources

    - by Dejan Sarka
    You can find many different data modeling resources. It is impossible to list all of them. I selected only the most valuable ones for me, and, of course, the ones I contributed to. Books Chris J. Date: An Introduction to Database Systems – IMO a “must” to understand the relational model correctly. Terry Halpin, Tony Morgan: Information Modeling and Relational Databases – meet the object-role modeling leaders. Chris J. Date, Nikos Lorentzos and Hugh Darwen: Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL – all theory needed to manage temporal data. Louis Davidson, Jessica M. Moss: Pro SQL Server 2012 Relational Database Design and Implementation – the best SQL Server focused data modeling book I know by two of my friends. Dejan Sarka, et al.: MCITP Self-Paced Training Kit (Exam 70-441): Designing Database Solutions by Using Microsoft® SQL Server™ 2005 – SQL Server 2005 data modeling training kit. Most of the text is still valid for SQL Server 2008, 2008 R2, 2012 and 2014. Itzik Ben-Gan, Lubor Kollar, Dejan Sarka, Steve Kass: Inside Microsoft SQL Server 2008 T-SQL Querying – Steve wrote a chapter with mathematical background, and I added a chapter with theoretical introduction to the relational model. Itzik Ben-Gan, Dejan Sarka, Roger Wolter, Greg Low, Ed Katibah, Isaac Kunen: Inside Microsoft SQL Server 2008 T-SQL Programming – I added three chapters with theoretical introduction and practical solutions for the user-defined data types, dynamic schema and temporal data. Dejan Sarka, Matija Lah, Grega Jerkic: Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft SQL Server 2012 – my first two chapters are about data warehouse design and implementation. Courses Data Modeling Essentials – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. Logical and Physical Modeling for Analytical Applications – online course I wrote for Pluralsight. Working with Temporal data in SQL Server – my latest Pluralsight course, where besides theory and implementation I introduce many original ways how to optimize temporal queries. Forthcoming presentations SQL Bits 12, July 17th – 19th, Telford, UK – I have a full-day pre-conference seminar Advanced Data Modeling Topics there.

    Read the article

  • JavaScript distributed computing project

    - by Ben L.
    I made a website that does absolutely nothing, and I've proven to myself that people like to stay there - I've already logged 11+ hours worth of cumulative time on the page. My question is whether it would be possible (or practical) to use the website as a distributed computing site. My first impulse was to find out if there were any JavaScript distributed computing projects already active, so that I could put a piece of code on the page and be done. Unfortunately, all I could find was a big list of websites that thought it might be a cool idea. I'm thinking that I might want to start with something like integer factorization - in this case, RSA numbers. It would be easy for the server to check if an answer was correct (simply test for modulus equals zero), and also easy to implement. Is my idea feasible? Is there already a project out there that I can use?

    Read the article

  • Why Oracle Data Integrator for Big Data?

    - by Mala Narasimharajan
    Big Data is everywhere these days - but what exactly is it? It’s data that comes from a multitude of sources – not only structured data, but unstructured data as well.  The sheer volume of data is mindboggling – here are a few examples of big data: climate information collected from sensors, social media information, digital pictures, log files, online video files, medical records or online transaction records.  These are just a few examples of what constitutes big data.   Embedded in big data is tremendous value and being able to manipulate, load, transform and analyze big data is key to enhancing productivity and competitiveness.  The value of big data lies in its propensity for greater in-depth analysis and data segmentation -- in turn giving companies detailed information on product performance, customer preferences and inventory.  Furthermore, by being able to store and create more data in digital form, “big data can unlock significant value by making information transparent and usable at much higher frequency." (McKinsey Global Institute, May 2011) Oracle's flagship product for bulk data movement and transformation, Oracle Data Integrator, is a critical component of Oracle’s Big Data strategy. ODI provides automation, bulk loading, and validation and transformation capabilities for Big Data while minimizing the complexities of using Hadoop.  Specifically, the advantages of ODI in a Big Data scenario are due to pre-built Knowledge Modules that drive processing in Hadoop. This leverages the graphical UI to load and unload data from Hadoop, perform data validations and create mapping expressions for transformations.  The Knowledge Modules provide a key jump-start and eliminate a significant amount of Hadoop development.  Using Oracle Data Integrator together with Oracle Big Data Connectors, you can simplify the complexities of mapping, accessing, and loading big data (via NoSQL or HDFS) but also correlating your enterprise data – this correlation may require integrating across heterogeneous and standards-based environments, connecting to Oracle Exadata, or sourcing via a big data platform such as Oracle Big Data Appliance. To learn more about Oracle Data Integration and Big Data, download our resource kit to see the latest in whitepapers, webinars, downloads, and more… or go to our website on www.oracle.com/bigdata

    Read the article

  • Distributed Transaction Framework across webservices

    - by John Petrak
    I am designing a new system that has one central web service and several site web services which are spread across the country and some overseas. It has some data that must be identical on all sites. So my plan is to maintain that data in the central web service and then "sync" the data to sites. This includes inserts, edits and deletes. I see a problem when deleting, if one site has used the record, then I need to undo the delete that has happened on the other servers. This lead me to idea that I need some sort of transaction system that can work across different web servers. Before I design one from scratch, I would like to know if anyone has come across this sort of problem and if there are any frame works or even design patterns that might aid me?

    Read the article

  • Fraud and Anomaly Detection using Oracle Data Mining YouTube-like Video

    - by chberger
    I've created and recorded another YouTube-like presentation and "live" demos of Oracle Advanced Analytics Option, this time focusing on Fraud and Anomaly Detection using Oracle Data Mining.  [Note:  It is a large MP4 file that will open and play in place.  The sound quality is weak so you may need to turn up the volume.] Data is your most valuable asset. It represents the entire history of your organization and its interactions with your customers.  Predictive analytics leverages data to discover patterns, relationships and to help you even make informed predictions.   Oracle Data Mining (ODM) automatically discovers relationships hidden in data.  Predictive models and insights discovered with ODM address business problems such as:  predicting customer behavior, detecting fraud, analyzing market baskets, profiling and loyalty.  Oracle Data Mining, part of the Oracle Advanced Analytics (OAA) Option to the Oracle Database EE, embeds 12 high performance data mining algorithms in the SQL kernel of the Oracle Database. This eliminates data movement, delivers scalability and maintains security.  But, how do you find these very important needles or possibly fraudulent transactions and huge haystacks of data? Oracle Data Mining’s 1 Class Support Vector Machine algorithm is specifically designed to identify rare or anomalous records.  Oracle Data Mining's 1-Class SVM anomaly detection algorithm trains on what it believes to be considered “normal” records, build a descriptive and predictive model which can then be used to flags records that, on a multi-dimensional basis, appear to not fit in--or be different.  Combined with clustering techniques to sort transactions into more homogeneous sub-populations for more focused anomaly detection analysis and Oracle Business Intelligence, Enterprise Applications and/or real-time environments to "deploy" fraud detection, Oracle Data Mining delivers a powerful advanced analytical platform for solving important problems.  With OAA/ODM you can find suspicious expense report submissions, flag non-compliant tax submissions, fight fraud in healthcare claims and save huge amounts of money in fraudulent claims  and abuse.   This presentation and several brief demos will show Oracle Data Mining's fraud and anomaly detection capabilities.  

    Read the article

  • General Policies and Procedures for Maintaining the Value of Data Assets

    Here is a general list for policies and procedures regarding maintaining the value of data assets. Data Backup Policies and Procedures Backups are very important when dealing with data because there is always the chance of losing data due to faulty hardware or a user activity. So the need for a strategic backup system should be mandatory for all companies. This being said, in the real world some companies that I have worked for do not really have a good data backup plan. Typically when companies tend to take this kind of approach in data backups usually the data is not really recoverable.  Unfortunately when companies do not regularly test their backup plans they get a false sense of security because they think that they are covered. However, I can tell you from personal and professional experience that a backup plan/system is never fully implemented until it is regularly tested prior to the time when it actually needs to be used. Disaster Recovery Plan Expanding on Backup Policies and Procedures, a company needs to also have a disaster recovery plan in order to protect its data in case of a catastrophic disaster.  Disaster recovery plans typically encompass how to restore all of a company’s data and infrastructure back to a restored operational status.  Most Disaster recovery plans also include time estimates on how long each step of the disaster recovery plan should take to be executed.  It is important to note that disaster recovery plans are never fully implemented until they have been tested just like backup plans. Disaster recovery plans should be tested regularly so that the business can be confident in not losing any or minimal data due to a catastrophic disaster. Firewall Policies and Content Filters One way companies can protect their data is by using a firewall to separate their internal network from the outside. Firewalls allow for enabling or disabling network access as data passes through it by applying various defined restrictions. Furthermore firewalls can also be used to prevent access from the internal network to the outside by these same factors. Common Firewall Restrictions Destination/Sender IP Address Destination/Sender Host Names Domain Names Network Ports Companies can also desire to restrict what their network user’s view on the internet through things like content filters. Content filters allow a company to track what webpages a person has accessed and can also restrict user’s access based on established rules set up in the content filter. This device and/or software can block access to domains or specific URLs based on a few factors. Common Content Filter Criteria Known malicious sites Specific Page Content Page Content Theme  Anti-Virus/Mal-ware Polices Fortunately, most companies utilize antivirus programs on all computers and servers for good reason, virus have been known to do the following: Corrupt/Invalidate Data, Destroy Data, and Steal Data. Anti-Virus applications are a great way to prevent any malicious application from being able to gain access to a company’s data.  However, anti-virus programs must be constantly updated because new viruses are always being created, and the anti-virus vendors need to distribute updates to their applications so that they can catch and remove them. Data Validation Policies and Procedures Data validation is very important to ensure that only accurate information is stored. The existence of invalid data can cause major problems when businesses attempt to use data for knowledge based decisions and for performance reporting. Data Scrubbing Policies and Procedures Data scrubbing is valuable to companies in one of two ways. The first can be used to clean data prior to being analyzed for report generation. The second is that it allows companies to remove things like personally Identifiable information from its data prior to transmit it between multiple environments or if the information is sent to an external location. An example of this can be seen with medical records in regards to HIPPA laws that prohibit the storage of specific personal and medical information. Additionally, I have professionally run in to a scenario where the Canadian government does not allow any Canadian’s personal information to be stored on a server not located in Canada. Encryption Practices The use of encryption is very valuable when a company needs to any personal information. This allows users with the appropriated access levels to view or confirm the existence or accuracy of data within a system by either decrypting the information or encrypting a piece of data and comparing it to the stored version.  Additionally, if for some unforeseen reason the data got in to the wrong hands then they would have to first decrypt the data before they could even be able to read it. Encryption just adds and additional layer of protection around data itself. Standard Normalization Practices The use of standard data normalization practices is very important when dealing with data because it can prevent allot of potential issues by eliminating the potential for unnecessary data duplication. Issues caused by data duplication include excess use of data storage, increased chance for invalidated data, and over use of data processing. Network and Database Security/Access Policies Every company has some form of network/data access policy even if they have none. These policies help secure data from being seen by inappropriate users along with preventing the data from being updated or deleted by users. In addition, without a good security policy there is a large potential for data to be corrupted by unassuming users or even stolen. Data Storage Policies Data storage polices are very important depending on how they are implemented especially when a company is trying to utilize them in conjunction with other policies like Data Backups. I have worked at companies where all network user folders are constantly backed up, and if a user wanted to ensure the existence of a piece of data in the form of a file then they had to store that file in their network folder. Conversely, I have also worked in places where when a user logs on or off of the network there entire user profile is backed up. Training Policies One of the biggest ways to prevent data loss and ensure that data will remain a company asset is through training. The practice of properly train employees on how to work with in systems that access data is crucial when trying to ensure a company’s data will remain an asset. Users need to be trained on how to manipulate a company’s data in order to perform their tasks to reduce the chances of invalidating data.

    Read the article

  • Synchronizing local and remote cache in distributed caching

    - by ltfishie
    With a distributed cache, a subset of the cache is kept locally while the rest is held remotely. In a get operation, if the entry is not available locally, the remote cache will be used and and the entry is added to local cache. In a put operation, both the local cache and remote cache are updated. Other nodes in the cluster also need to be notified to invalidate their local cache as well. What's a simplest way to achieve this if you implemented it yourself, assuming that nodes are not aware of each other. Edit My current implementation goes like this: Each cache entry contains a time stamp. Put operation will update local cache and remote cache Get operation will try local cache then remote cache A background thread on each node will check remote cache periodically for each entry in local cache. If the timestamp on remote is newer overwrite the local. If entry is not found in remote, delete it from local.

    Read the article

  • Scalable distributed file system for blobs like images and other documents

    - by Pinnacle
    Cassandra & HBase both do not efficiently support storage of blobs like images. Storing directly on HDFS stresses the Namenode because of huge number of files. Facebook uses Haystack for images and attachments storage, but this is not open source. So is Lustre a good choice for distributed blob storage? I have read that Amazon S3 is used by many, but this would cost money and personally, I would not like to rely on third party system. What are other suggestions?

    Read the article

  • Importing Multiple Schemas to a Model in Oracle SQL Developer Data Modeler

    - by thatjeffsmith
    Your physical data model might stretch across multiple Oracle schemas. Or maybe you just want a single diagram containing tables, views, etc. spanning more than a single user in the database. The process for importing a data dictionary is the same, regardless if you want to suck in objects from one schema, or many schemas. Let’s take a quick look at how to get started with a data dictionary import. I’m using Oracle SQL Developer in this example. The process is nearly identical in Oracle SQL Developer Data Modeler – the only difference being you’ll use the ‘File’ menu to get started versus the ‘File – Data Modeler’ menu in SQL Developer. Remember, the functionality is exactly the same whether you use SQL Developer or SQL Developer Data Modeler when it comes to the data modeling features – you’ll just have a cleaner user interface in SQL Developer Data Modeler. Importing a Data Dictionary to a Model You’ll want to open or create your model first. You can import objects to an existing or new model. The easiest way to get started is to simply open the ‘Browser’ under the View menu. The Browser allows you to navigate your open designs/models You’ll see an ‘Untitled_1′ model by default. I’ve renamed mine to ‘hr_sh_scott_demo.’ Now go back to the File menu, and expand the ‘Data Modeler’ section, and select ‘Import – Data Dictionary.’ This is a fancy way of saying, ‘suck objects out of the database into my model’ Connect! If you haven’t already defined a connection to the database you want to reverse engineer, you’ll need to do that now. I’m going to assume you already have that connection – so select it, and hit the ‘Next’ button. Select the Schema(s) to be imported Select one or more schemas you want to import The schemas selected on this page of the wizard will dictate the lists of tables, views, synonyms, and everything else you can choose from in the next wizard step to import. For brevity, I have selected ALL tables, views, and synonyms from 3 different schemas: HR SCOTT SH Once I hit the ‘Finish’ button in the wizard, SQL Developer will interrogate the database and add the objects to our model. The Big Model and the 3 Little Models I can now see ALL of the objects I just imported in the ‘hr_sh_scott_demo’ relational model in my design tree, and in my relational diagram. Quick Tip: Oracle SQL Developer calls what most folks think of as a ‘Physical Model’ the ‘Relational Model.’ Same difference, mostly. In SQL Developer, a Physical model allows you to define partitioning schemes, advanced storage parameters, and add your PL/SQL code. You can have multiple physical models per relational models. For example I might have a 4 Node RAC in Production that uses partitioning, but in test/dev, only have a single instance with no partitioning. I can have models for both of those physical implementations. The list of tables in my relational model Wouldn’t it be nice if I could segregate the objects based on their schema? Good news, you can! And it’s done by default Several of you might already know where I’m going with this – SUBVIEWS. You can easily create a ‘SubView’ by selecting one or more objects in your model or diagram and add them to a new SubView. SubViews are just mini-models. They contain a subset of objects from the main model. This is very handy when you want to break your model into smaller, more digestible parts. The model information is identical across the model and subviews, so you don’t have to worry about making a change in one place and not having it propagate across your design. SubViews can be used as filters when you create reports and exports as well. So instead of generating a PDF for everything, just show me what’s in my ‘ABC’ subview. But, I don’t want to do any work! Remember, I’m really lazy. More good news – it’s already done by default! The schemas are automatically used to create default SubViews Auto-Navigate to the Object in the Diagram In the subview tree node, right-click on the object you want to navigate to. You can ask to be taken to the main model view or to the SubView location. If you haven’t already opened the SubView in the diagram, it will be automatically opened for you. The SubView diagram only contains the objects from that SubView Your SubView might still be pretty big, many dozens of objects, so don’t forget about the ‘Navigator‘ either! In summary, use the ‘Import’ feature to add existing database objects to your model. If you import from multiple schemas, take advantage of the default schema based SubViews to help you manage your models! Sometimes less is more!

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >