Search Results

Search found 60427 results on 2418 pages for 'data transfer'.

Page 3/2418 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Deploying Data Mining Models using Model Export and Import

    - by [email protected]
    In this post, we'll take a look at how Oracle Data Mining facilitates model deployment. After building and testing models, a next step is often putting your data mining model into a production system -- referred to as model deployment. The ability to move data mining model(s) easily into a production system can greatly speed model deployment, and reduce the overall cost. Since Oracle Data Mining provides models as first class database objects, models can be manipulated using familiar database techniques and technology. For example, one or more models can be exported to a flat file, similar to a database table dump file (.dmp). This file can be moved to a different instance of Oracle Database EE, and then imported. All methods for exporting and importing models are based on Oracle Data Pump technology and found in the DBMS_DATA_MINING package. Before performing the actual export or import, a directory object must be created. A directory object is a logical name in the database for a physical directory on the host computer. Read/write access to a directory object is necessary to access the host computer file system from within Oracle Database. For our example, we'll work in the DMUSER schema. First, DMUSER requires the privilege to create any directory. This is often granted through the sysdba account. grant create any directory to dmuser; Now, DMUSER can create the directory object specifying the path where the exported model file (.dmp) should be placed. In this case, on a linux machine, we have the directory /scratch/oracle. CREATE OR REPLACE DIRECTORY dmdir AS '/scratch/oracle'; If you aren't sure of the exact name of the model or models to export, you can find the list of models using the following query: select model_name from user_mining_models; There are several options when exporting models. We can export a single model, multiple models, or all models in a schema using the following procedure calls: BEGIN   DBMS_DATA_MINING.EXPORT_MODEL ('MY_MODEL.dmp','dmdir','name =''MY_DT_MODEL'''); END; BEGIN   DBMS_DATA_MINING.EXPORT_MODEL ('MY_MODELS.dmp','dmdir',              'name IN (''MY_DT_MODEL'',''MY_KM_MODEL'')'); END; BEGIN   DBMS_DATA_MINING.EXPORT_MODEL ('ALL_DMUSER_MODELS.dmp','dmdir'); END; A .dmp file can be imported into another schema or database using the following procedure call, for example: BEGIN   DBMS_DATA_MINING.IMPORT_MODEL('MY_MODELS.dmp', 'dmdir'); END; As with models from any data mining tool, when moving a model from one environment to another, care needs to be taken to ensure the transformations that prepare the data for model building are matched (with appropriate parameters and statistics) in the system where the model is deployed. Oracle Data Mining provides automatic data preparation (ADP) and embedded data preparation (EDP) to reduce, or possibly eliminate, the need to explicitly transport transformations with the model. In the case of ADP, ODM automatically prepares the data and includes the necessary transformations in the model itself. In the case of EDP, users can associate their own transformations with attributes of a model. These transformations are automatically applied when applying the model to data, i.e., scoring. Exporting and importing a model with ADP or EDP results in these transformations being immediately available with the model in the production system.

    Read the article

  • Why Oracle Data Integrator for Big Data?

    - by Mala Narasimharajan
    Big Data is everywhere these days - but what exactly is it? It’s data that comes from a multitude of sources – not only structured data, but unstructured data as well.  The sheer volume of data is mindboggling – here are a few examples of big data: climate information collected from sensors, social media information, digital pictures, log files, online video files, medical records or online transaction records.  These are just a few examples of what constitutes big data.   Embedded in big data is tremendous value and being able to manipulate, load, transform and analyze big data is key to enhancing productivity and competitiveness.  The value of big data lies in its propensity for greater in-depth analysis and data segmentation -- in turn giving companies detailed information on product performance, customer preferences and inventory.  Furthermore, by being able to store and create more data in digital form, “big data can unlock significant value by making information transparent and usable at much higher frequency." (McKinsey Global Institute, May 2011) Oracle's flagship product for bulk data movement and transformation, Oracle Data Integrator, is a critical component of Oracle’s Big Data strategy. ODI provides automation, bulk loading, and validation and transformation capabilities for Big Data while minimizing the complexities of using Hadoop.  Specifically, the advantages of ODI in a Big Data scenario are due to pre-built Knowledge Modules that drive processing in Hadoop. This leverages the graphical UI to load and unload data from Hadoop, perform data validations and create mapping expressions for transformations.  The Knowledge Modules provide a key jump-start and eliminate a significant amount of Hadoop development.  Using Oracle Data Integrator together with Oracle Big Data Connectors, you can simplify the complexities of mapping, accessing, and loading big data (via NoSQL or HDFS) but also correlating your enterprise data – this correlation may require integrating across heterogeneous and standards-based environments, connecting to Oracle Exadata, or sourcing via a big data platform such as Oracle Big Data Appliance. To learn more about Oracle Data Integration and Big Data, download our resource kit to see the latest in whitepapers, webinars, downloads, and more… or go to our website on www.oracle.com/bigdata

    Read the article

  • General Policies and Procedures for Maintaining the Value of Data Assets

    Here is a general list for policies and procedures regarding maintaining the value of data assets. Data Backup Policies and Procedures Backups are very important when dealing with data because there is always the chance of losing data due to faulty hardware or a user activity. So the need for a strategic backup system should be mandatory for all companies. This being said, in the real world some companies that I have worked for do not really have a good data backup plan. Typically when companies tend to take this kind of approach in data backups usually the data is not really recoverable.  Unfortunately when companies do not regularly test their backup plans they get a false sense of security because they think that they are covered. However, I can tell you from personal and professional experience that a backup plan/system is never fully implemented until it is regularly tested prior to the time when it actually needs to be used. Disaster Recovery Plan Expanding on Backup Policies and Procedures, a company needs to also have a disaster recovery plan in order to protect its data in case of a catastrophic disaster.  Disaster recovery plans typically encompass how to restore all of a company’s data and infrastructure back to a restored operational status.  Most Disaster recovery plans also include time estimates on how long each step of the disaster recovery plan should take to be executed.  It is important to note that disaster recovery plans are never fully implemented until they have been tested just like backup plans. Disaster recovery plans should be tested regularly so that the business can be confident in not losing any or minimal data due to a catastrophic disaster. Firewall Policies and Content Filters One way companies can protect their data is by using a firewall to separate their internal network from the outside. Firewalls allow for enabling or disabling network access as data passes through it by applying various defined restrictions. Furthermore firewalls can also be used to prevent access from the internal network to the outside by these same factors. Common Firewall Restrictions Destination/Sender IP Address Destination/Sender Host Names Domain Names Network Ports Companies can also desire to restrict what their network user’s view on the internet through things like content filters. Content filters allow a company to track what webpages a person has accessed and can also restrict user’s access based on established rules set up in the content filter. This device and/or software can block access to domains or specific URLs based on a few factors. Common Content Filter Criteria Known malicious sites Specific Page Content Page Content Theme  Anti-Virus/Mal-ware Polices Fortunately, most companies utilize antivirus programs on all computers and servers for good reason, virus have been known to do the following: Corrupt/Invalidate Data, Destroy Data, and Steal Data. Anti-Virus applications are a great way to prevent any malicious application from being able to gain access to a company’s data.  However, anti-virus programs must be constantly updated because new viruses are always being created, and the anti-virus vendors need to distribute updates to their applications so that they can catch and remove them. Data Validation Policies and Procedures Data validation is very important to ensure that only accurate information is stored. The existence of invalid data can cause major problems when businesses attempt to use data for knowledge based decisions and for performance reporting. Data Scrubbing Policies and Procedures Data scrubbing is valuable to companies in one of two ways. The first can be used to clean data prior to being analyzed for report generation. The second is that it allows companies to remove things like personally Identifiable information from its data prior to transmit it between multiple environments or if the information is sent to an external location. An example of this can be seen with medical records in regards to HIPPA laws that prohibit the storage of specific personal and medical information. Additionally, I have professionally run in to a scenario where the Canadian government does not allow any Canadian’s personal information to be stored on a server not located in Canada. Encryption Practices The use of encryption is very valuable when a company needs to any personal information. This allows users with the appropriated access levels to view or confirm the existence or accuracy of data within a system by either decrypting the information or encrypting a piece of data and comparing it to the stored version.  Additionally, if for some unforeseen reason the data got in to the wrong hands then they would have to first decrypt the data before they could even be able to read it. Encryption just adds and additional layer of protection around data itself. Standard Normalization Practices The use of standard data normalization practices is very important when dealing with data because it can prevent allot of potential issues by eliminating the potential for unnecessary data duplication. Issues caused by data duplication include excess use of data storage, increased chance for invalidated data, and over use of data processing. Network and Database Security/Access Policies Every company has some form of network/data access policy even if they have none. These policies help secure data from being seen by inappropriate users along with preventing the data from being updated or deleted by users. In addition, without a good security policy there is a large potential for data to be corrupted by unassuming users or even stolen. Data Storage Policies Data storage polices are very important depending on how they are implemented especially when a company is trying to utilize them in conjunction with other policies like Data Backups. I have worked at companies where all network user folders are constantly backed up, and if a user wanted to ensure the existence of a piece of data in the form of a file then they had to store that file in their network folder. Conversely, I have also worked in places where when a user logs on or off of the network there entire user profile is backed up. Training Policies One of the biggest ways to prevent data loss and ensure that data will remain a company asset is through training. The practice of properly train employees on how to work with in systems that access data is crucial when trying to ensure a company’s data will remain an asset. Users need to be trained on how to manipulate a company’s data in order to perform their tasks to reduce the chances of invalidating data.

    Read the article

  • Slow network file transfer (under 20KB/s) on newly built x64 Win7

    - by Mangoshake
    I am getting <20KB/s for local network file transfer. If I transfer a very small file (less than 100KB) it would start quickly then slow down to <20KB/s. all subsequently network file transfer would be slow, a reboot is needed to reset this. If I transfer a large file it would be stuck on calculating for a long time and then begin with <20KB/s immediately. This is a newly built desktop running Windows 7 x64 SP1. Realtek gigabit LAN from the motherboard (ASRock Extreme3 gen3). Problematic speed is observed on the private LAN, both through ethernet and WiFi. The Router is D-Link DIR-655. Remote Differential Compression is off. Drivers are up-to-date from ASRock's website. I have tested network file transfer to and from another Windows 7 laptop and a MacBook Pro, so I am fairly certain it is the desktop's problem. The slow speed only happens with one direction also, outbound from the desktop, regardless of whether I initiate the file transfer action from the origin or the destination. Inbound network file transfer and internet speeds are fine, so I don't think this is a hardware issue. I am getting 74.8MB/s internet upload speed from speedtest.net (http://www.speedtest.net/result/1852752479.png). Inbound network file transfer I can get around 10-15MB/s. I am hoping this community has some insight for me to troubleshoot this. I don't see anything obviously related from the Event Viewer, and beyond that I just don't know where else to look. Any suggestions are greatly appreciated, thank you in advance.

    Read the article

  • Importing Multiple Schemas to a Model in Oracle SQL Developer Data Modeler

    - by thatjeffsmith
    Your physical data model might stretch across multiple Oracle schemas. Or maybe you just want a single diagram containing tables, views, etc. spanning more than a single user in the database. The process for importing a data dictionary is the same, regardless if you want to suck in objects from one schema, or many schemas. Let’s take a quick look at how to get started with a data dictionary import. I’m using Oracle SQL Developer in this example. The process is nearly identical in Oracle SQL Developer Data Modeler – the only difference being you’ll use the ‘File’ menu to get started versus the ‘File – Data Modeler’ menu in SQL Developer. Remember, the functionality is exactly the same whether you use SQL Developer or SQL Developer Data Modeler when it comes to the data modeling features – you’ll just have a cleaner user interface in SQL Developer Data Modeler. Importing a Data Dictionary to a Model You’ll want to open or create your model first. You can import objects to an existing or new model. The easiest way to get started is to simply open the ‘Browser’ under the View menu. The Browser allows you to navigate your open designs/models You’ll see an ‘Untitled_1′ model by default. I’ve renamed mine to ‘hr_sh_scott_demo.’ Now go back to the File menu, and expand the ‘Data Modeler’ section, and select ‘Import – Data Dictionary.’ This is a fancy way of saying, ‘suck objects out of the database into my model’ Connect! If you haven’t already defined a connection to the database you want to reverse engineer, you’ll need to do that now. I’m going to assume you already have that connection – so select it, and hit the ‘Next’ button. Select the Schema(s) to be imported Select one or more schemas you want to import The schemas selected on this page of the wizard will dictate the lists of tables, views, synonyms, and everything else you can choose from in the next wizard step to import. For brevity, I have selected ALL tables, views, and synonyms from 3 different schemas: HR SCOTT SH Once I hit the ‘Finish’ button in the wizard, SQL Developer will interrogate the database and add the objects to our model. The Big Model and the 3 Little Models I can now see ALL of the objects I just imported in the ‘hr_sh_scott_demo’ relational model in my design tree, and in my relational diagram. Quick Tip: Oracle SQL Developer calls what most folks think of as a ‘Physical Model’ the ‘Relational Model.’ Same difference, mostly. In SQL Developer, a Physical model allows you to define partitioning schemes, advanced storage parameters, and add your PL/SQL code. You can have multiple physical models per relational models. For example I might have a 4 Node RAC in Production that uses partitioning, but in test/dev, only have a single instance with no partitioning. I can have models for both of those physical implementations. The list of tables in my relational model Wouldn’t it be nice if I could segregate the objects based on their schema? Good news, you can! And it’s done by default Several of you might already know where I’m going with this – SUBVIEWS. You can easily create a ‘SubView’ by selecting one or more objects in your model or diagram and add them to a new SubView. SubViews are just mini-models. They contain a subset of objects from the main model. This is very handy when you want to break your model into smaller, more digestible parts. The model information is identical across the model and subviews, so you don’t have to worry about making a change in one place and not having it propagate across your design. SubViews can be used as filters when you create reports and exports as well. So instead of generating a PDF for everything, just show me what’s in my ‘ABC’ subview. But, I don’t want to do any work! Remember, I’m really lazy. More good news – it’s already done by default! The schemas are automatically used to create default SubViews Auto-Navigate to the Object in the Diagram In the subview tree node, right-click on the object you want to navigate to. You can ask to be taken to the main model view or to the SubView location. If you haven’t already opened the SubView in the diagram, it will be automatically opened for you. The SubView diagram only contains the objects from that SubView Your SubView might still be pretty big, many dozens of objects, so don’t forget about the ‘Navigator‘ either! In summary, use the ‘Import’ feature to add existing database objects to your model. If you import from multiple schemas, take advantage of the default schema based SubViews to help you manage your models! Sometimes less is more!

    Read the article

  • download speed is fast but transfer rate is slow

    - by Ieyasu Sawada
    I've just changed ISP and I'm pretty disappointed with the transfer rate. My previous connection has a download speed of 1.08 Mb/s as seen from this site: http://speedtest.net and the download transfer rate is about 100kb/s for sites that doesn't limit their bandwidth. Now my connection has about 2Mb/s download speed but the transfer rate is dancing from 20-50kb/s . I was expecting a speed much higher than this because of the download speed that I'm getting when I'm testing. The question is what's the difference between transfer rate and download speed, is it normal to have a high download speed but low transfer rate, should the download speed be proportional to the transfer rate?

    Read the article

  • Free Domain hosting configurations and transfer

    - by upog
    I have registered a new domain name with GODaddy.com now i would like to host my domain for free. Assume the app is a basic HTML page.I have done some search and decided to host it under google app engine I am looking answers for few question currently my domain name is managed by GODaddy, how can i transfer it to Google app engine, so that going forward it will be managed by Google How can i configure the new domain in Google app engine and associate with my domain name Is there any indirect cost involved in domain hosting service hosted by Google app engine Any suggestion for free and reliable hosting is also welcomed Update Can i host free web page in cloud.google.com ?

    Read the article

  • USB file transfer preparing to copy, but file count climbs indefinitely

    - by Alex
    I have downloaded Ubuntu 12.04 to back up my Windows machine that won't boot and am running Ubuntu from the CD. I copied and pasted a volume of about 160GB to my external HDD. The transfer has been stuck in the "preparing to copy" stage for several hours and is displaying a file/GB count about twice as high as the volume of data actually being copied! The number is now larger than the entire partition that's being copied from! However I know it's doing something because occasionally it pops up with a minor I/O error on this or that file which I then have to click through. I've not had this problem before so I can only assume it's a Linux/Ubuntu thing. More importantly what I want to know is is there any other way to copy it across that will actually work?

    Read the article

  • Slow transfer to external USB3 hard drive

    - by JMP
    Trying to backup data from hard drive before reloading windows following some issue with its load. Having trouble with the file transfer to a USB3/2 external hard drive NTFS. Getting transfer speed of about 116.7kB/sec. In other words its taking about 5 hours to transfer 1.4GB. I've got about 80GB to go. So the transfer is going to take 11days. Seems a little on the slow side. Am I missing something? Is there a way to make this faster. No issue with the external drive transferring this amount in windows. But don't have that option at the moment.

    Read the article

  • Deploying Data Mining Models using Model Export and Import, Part 2

    - by [email protected]
    In my last post, Deploying Data Mining Models using Model Export and Import, we explored using DBMS_DATA_MINING.EXPORT_MODEL and DBMS_DATA_MINING.IMPORT_MODEL to enable moving a model from one system to another. In this post, we'll look at two distributed scenarios that make use of this capability and a tip for easily moving models from one machine to another using only Oracle Database, not an external file transport mechanism, such as FTP. The first scenario, consider a company with geographically distributed business units, each collecting and managing their data locally for the products they sell. Each business unit has in-house data analysts that build models to predict which products to recommend to customers in their space. A central telemarketing business unit also uses these models to score new customers locally using data collected over the phone. Since the models recommend different products, each customer is scored using each model. This is depicted in Figure 1.Figure 1: Target instance importing multiple remote models for local scoring In the second scenario, consider multiple hospitals that collect data on patients with certain types of cancer. The data collection is standardized, so each hospital collects the same patient demographic and other health / tumor data, along with the clinical diagnosis. Instead of each hospital building it's own models, the data is pooled at a central data analysis lab where a predictive model is built. Once completed, the model is distributed to hospitals, clinics, and doctor offices who can score patient data locally.Figure 2: Multiple target instances importing the same model from a source instance for local scoring Since this blog focuses on model export and import, we'll only discuss what is necessary to move a model from one database to another. Here, we use the package DBMS_FILE_TRANSFER, which can move files between Oracle databases. The script is fairly straightforward, but requires setting up a database link and directory objects. We saw how to create directory objects in the previous post. To create a database link to the source database from the target, we can use, for example: create database link SOURCE1_LINK connect to <schema> identified by <password> using 'SOURCE1'; Note that 'SOURCE1' refers to the service name of the remote database entry in your tnsnames.ora file. From SQL*Plus, first connect to the remote database and export the model. Note that the model_file_name does not include the .dmp extension. This is because export_model appends "01" to this name.  Next, connect to the local database and invoke DBMS_FILE_TRANSFER.GET_FILE and import the model. Note that "01" is eliminated in the target system file name.  connect <source_schema>/<password>@SOURCE1_LINK; BEGIN  DBMS_DATA_MINING.EXPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp',                                 'MY_SOURCE_DIR_OBJECT',                                 'name =''MY_MINING_MODEL'''); END; connect <target_schema>/<password>; BEGIN  DBMS_FILE_TRANSFER.GET_FILE ('MY_SOURCE_DIR_OBJECT',                               'EXPORT_FILE_NAME' || '01.dmp',                               'SOURCE1_LINK',                               'MY_TARGET_DIR_OBJECT',                               'EXPORT_FILE_NAME' || '.dmp' );  DBMS_DATA_MINING.IMPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp',                                 'MY_TARGET_DIR_OBJECT'); END; To clean up afterward, you may want to drop the exported .dmp file at the source and the transferred file at the target. For example, utl_file.fremove('&directory_name', '&model_file_name' || '.dmp');

    Read the article

  • Abstract Data Type and Data Structure

    - by mark075
    It's quite difficult for me to understand these terms. I searched on google and read a little on Wikipedia but I'm still not sure. I've determined so far that: Abstract Data Type is a definition of new type, describes its properties and operations. Data Structure is an implementation of ADT. Many ADT can be implemented as the same Data Structure. If I think right, array as ADT means a collection of elements and as Data Structure, how it's stored in a memory. Stack is ADT with push, pop operations, but can we say about stack data structure if I mean I used stack implemented as an array in my algorithm? And why heap isn't ADT? It can be implemented as tree or an array.

    Read the article

  • How to speed up file transfer to/from Ubuntu Server 11.10 (wifi)

    - by Alexander
    I've been searching AU & elsewhere for the last day and a half. Haven't found an answer so I joined AU to ask for help. I'm hoping someone can point me in the right direction. Ubuntu Server 11.10 Samba VSFTPD Windows 7 PC 2 MacBook Pro - Snow Leopard/Lion 1 iMac - Lion Wireless LAN using DLink DIR-655 Link Speed: 195 Mbit/s on Mac - 54Mbps on Windows ISP Connection: Cable - 20 down/3 up No Domain Controller. All machines are members of the same workgroup. No matter how I connect I can't get better than about 700K transfer rate up/down. Mac/PC, SMB/ftp, Domain Name/Local IP I've tried different user accounts and using different folders, different volumes on the server. Nothing seems to make a difference. 700K up/down. Period. Any suggestions would be greatly appreciated. Thanks, Alexander EDIT: Using sftp now and uploading seems to peak at 980k. After about 5 minutes into a 650MB file, downloading is at 1072k and climbing about 500b/s every ten seconds. If any of that matters... I was expecting a lot faster than 1Mb tx rate. Am I off base here? EDIT: From all I've read so far, perhaps the speed isn't that bad. I only installed Ubuntu out of boredom this past weekend. The trouble is, I like it. Guess it's time to ditch the wifi and run some Cat 5.

    Read the article

  • Big Data – Various Learning Resources – How to Start with Big Data? – Day 20 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned how to become a Data Scientist for Big Data. In this article we will go over various learning resources related to Big Data. In this series we have covered many of the most essential details about Big Data. At the beginning of this series, I have encouraged readers to send me questions. One of the most popular questions is - “I want to learn more about Big Data. Where can I learn it?” This is indeed a great question as there are plenty of resources out to learn about Big Data and it is indeed difficult to select on one resource to learn Big Data. Hence I decided to write here a few of the very important resources which are related to Big Data. Learn from Pluralsight Pluralsight is a global leader in high-quality online training for hardcore developers.  It has fantastic Big Data Courses and I started to learn about Big Data with the help of Pluralsight. Here are few of the courses which are directly related to Big Data. Big Data: The Big Picture Big Data Analytics with Tableau NoSQL: The Big Picture Understanding NoSQL Data Analysis Fundamentals with Tableau I encourage all of you start with this video course as they are fantastic fundamentals to learn Big Data. Learn from Apache Resources at Apache are single point the most authentic learning resources. If you want to learn fundamentals and go deep about every aspect of the Big Data, I believe you must understand various concepts in Apache’s library. I am pretty impressed with the documentation and I am personally referencing it every single day when I work with Big Data. I strongly encourage all of you to bookmark following all the links for authentic big data learning. Haddop - The Apache Hadoop® project develops open-source software for reliable, scalable, distributed computing. Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which include support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heat maps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper: A high-performance coordination service for distributed applications. Learn from Vendors One of the biggest issues with about learning Big Data is setting up the environment. Every Big Data vendor has different environment request and there are lots of things require to set up Big Data framework. Many of the users do not start with Big Data as they are afraid about the resources required to set up framework as well as a time commitment. Here Hortonworks have created fantastic learning environment. They have created Sandbox with everything one person needs to learn Big Data and also have provided excellent tutoring along with it. Sandbox comes with a dozen hands-on tutorial that will guide you through the basics of Hadoop as well it contains the Hortonworks Data Platform. I think Hortonworks did a fantastic job building this Sandbox and Tutorial. Though there are plenty of different Big Data Vendors I have decided to list only Hortonworks due to their unique setup. Please leave a comment if there are any other such platform to learn Big Data. I will include them over here as well. Learn from Books There are indeed few good books out there which one can refer to learn Big Data. Here are few good books which I have read. I will update the list as I will learn more. Ethics of Big Data Balancing Risk and Innovation Big Data for Dummies Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions If you search on Amazon there are millions of the books but I think above three books are a great set of books and it will give you great ideas about Big Data. Once you go through above books, you will have a clear idea about what is the next step you should follow in this series. You will be capable enough to make the right decision for yourself. Tomorrow In tomorrow’s blog post we will wrap up this series of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Data Storage Options

    - by Kenneth
    When I was working as a website designer/engineer I primarily used databases for storage of much of my dynamic data. It was very easy and convenient to use this method and seemed like a standard practice from my research on the matter. I'm now working on shifting away from websites and into desktop applications. What are the best practices for data storage for desktop applications? I ask because I have noticed that most programs I use on a personal level don't appear to use a database for data storage unless its embedded in the program. (I'm not thinking of an application like a word processor where it makes sense to have data stored in individual files as defined by the user. Rather I'm thinking of something more along the lines of a calendar application which would need to store dates and event info and such where accessing that information would be much easier if stored in a database... at least as far as my experience would indicate.) Thanks for the input!

    Read the article

  • What is a Data Warehouse?

    Typically Data Warehouses are considered to be non-volatile in comparison to traditional databasesdue to the fact that data within the warehouse does not change that often.  In addition, Data Warehouses typically represent data through the use of Multidimensional Conceptual Views that allow data to be extracted based on the view and the current position within the view. Common Data Warehouse Traits Relatively Non-volatile Data Supports Data Extraction and Analysis Optimized for Data Retrieval and Analysis Multidimensional Views of Data Flexible Reporting Multi User Support Generic Dimensionality Transparent Accessible Unlimited Dimensions of Data Unlimited Aggregation levels of Data Normally, Data Warehouses are much larger then there traditional database counterparts due to the fact that they store the basis data along with derived data via Multidimensional Conceptual Views. As companies store larger and larger amounts of data, they will need a way to effectively and accurately extract analysis information that can be used to aide in formulating current and future business decisions. This process can be done currently through data mining within a Data Warehouse. Data Warehouses provide access to data derived through complex analysis, knowledge discovery and decision making. Secondly, they support the demands for high performance in regards to analyzing an organization’s existing and current data. Data Warehouses provide support for an organization’s data and acquired business knowledge.  Within a Data Warehouse multiple types of operations/sub systems are supported. Common Data Warehouse Sub Systems Online Analytical Processing (OLAP) Decision –Support Systems (DSS) Online Transaction Processing (OLTP)

    Read the article

  • Move data from others user accounts in my user account

    - by user118136
    I had problems with compiz setting and I make multiple accounts, now I want to transfer my information from all deleted users in my current account, some data I can not copy because I am not right to read, I type in terminal "sudo nautilus" and I get the permission for read, but the copied data is available only for superusers and I must charge the permissions for each file and each folder. How I can copy the information with out the superuser rights OR how I can charge the permissions for selected folder and all files and folders included in it?

    Read the article

  • How Do You Actually Model Data?

    Since the 1970’s Developers, Analysts and DBAs have been able to represent concepts and relations in the form of data through the use of generic symbols.  But what is data modeling?  The first time I actually heard this term I could not understand why anyone would want to display a computer on a fashion show runway. Hey, what do you expect? At that time I was a freshman in community college, and obviously this was a long time ago.  I have since had the chance to learn what data modeling truly is through using it. Data modeling is a process of breaking down information and/or requirements in to common categories called objects. Once objects start being defined then relationships start to form based on dependencies found amongst other existing objects.  Currently, there are several tools on the market that help data designer actually map out objects and their relationships through the use of symbols and lines.  These diagrams allow for designs to be review from several perspectives so that designers can ensure that they have the optimal data design for their project and that the design is flexible enough to allow for potential changes and/or extension in the future. Additionally these basic models can always be further refined to show different levels of details depending on the target audience through the use of three different types of models. Conceptual Data Model(CDM)Conceptual Data Models include all key entities and relationships giving a viewer a high level understanding of attributes. Conceptual data model are created by gathering and analyzing information from various sources pertaining to a project during the typical planning phase of a project. Logical Data Model (LDM)Logical Data Models are conceptual data models that have been expanded to include implementation details pertaining to the data that it will store. Additionally, this model typically represents an origination’s business requirements and business rules by defining various attribute data types and relationships regarding each entity. This additional information can be directly translated to the Physical Data Model which reduces the actual time need to implement it. Physical Data Model(PDMs)Physical Data Model are transformed Logical Data Models that include the necessary tables, columns, relationships, database properties for the creation of a database. This model also allows for considerations regarding performance, indexing and denormalization that are applied through database rules, data integrity. Further expanding on why we actually use models in modern application/database development can be seen in the benefits that data modeling provides for data modelers and projects themselves, Benefits of Data Modeling according to Applied Information Science Abstraction that allows data designers remove concepts and ideas form hard facts in the form of data. This gives the data designers the ability to express general concepts and/or ideas in a generic form through the use of symbols to represent data items and the relationships between the items. Transparency through the use of data models allows complex ideas to be translated in to simple symbols so that the concept can be understood by all viewpoints and limits the amount of confusion and misunderstanding. Effectiveness in regards to tuning a model for acceptable performance while maintaining affordable operational costs. In addition it allows systems to be built on a solid foundation in terms of data. I shudder at the thought of a world without data modeling, think about it? Data is everywhere in our lives. Data modeling allows for optimizing a design for performance and the reduction of duplication. If one was to design a database without data modeling then I would think that the first things to get impacted would be database performance due to poorly designed database and there would be greater chances of unnecessary data duplication that would also play in to the excessive query times because unneeded records would need to be processed. You could say that a data designer designing a database is like a box of chocolates. You will never know what kind of database you will get until after it is built.

    Read the article

  • Data Structures usage and motivational aspects

    - by Aubergine
    For long student life I was always wondering why there are so many of them yet there seems to be lack of usage at all in many of them. The opinion didn't really change when I got a job. We have brilliant books on what they are and their complexities, but I never encounter resources which would actually give a good hint of practical usage. I perfectly understand that I have to look at problem , analyse required operations, look for data structure that does them efficiently. However in practice I never do that, not because of human laziness syndrome, but because when it comes to work I acknowledge time priority over self-development. Over time I thought that when I would be better developer I will automatically use more of them - that didn't happen at all or maybe I just didn't. Then I found that the colleagues usually in the same plate as me - knowing more or less some three of data structures and being totally happy about it and refusing to discuss this matter further with me, coming back to conversations about 'cool new languages' 'libraries that do jobs for you' and the joy to work under scrumban etc. I am stuck with ArrayLists, Arrays and SortedMap , which no matter what I do always suffice or either I tweak them to be capable of fulfilling my task. Yes, it might be inefficient but do we really have to care if Intel increases performance over years no matter if we improve our skills? Does new Xeon or IBM machines really care what we use? What if I like build things, but I am not particularly excited whether it is n log(n) or just n? Over twenty years the processing power increased enormously, which gives us freedom of not being critical about which one to use? On top of that new more optimized languages appear which support multiple cores more efficiently. To be more specific: I would like to find motivational material on complex real areas/cases of possible effective usages of data structures. I would be really grateful if you would provide relevant resources. There is similar question ,but in the end the links again mostly describe or do dumb example(vehicles, students or holy grail quest - yes, very relevant) them and people keep referring to the "scenario decides the data structure to use". I want to know these complex scenarios to be able to identify similarities to my scenario and then use them. The complex scenarios where it really matters and not necessarily of quantitive nature. It seems that data structures only concern is efficiency and nothing else? There seems to be no particular convenience for developer in use one over another. (only when I found scientific resources on why exactly simple carbohydrates are evil I stopped eating sugar and candies completely replacing it with less harmful fruits - I hope you can see the analogy)

    Read the article

  • Unleash AutoVue on Your Unmanaged Data

    - by [email protected]
    Over the years, I've spoken to hundreds of customers who use AutoVue to collaborate on their "managed" data stored in content management systems, product lifecycle management systems, etc. via our many integrations. Through these conversations I've also learned a harsh reality - we will never fully move away from unmanaged data (desktops, file servers, emails, etc). If you use AutoVue today you already know that even if your primary use is viewing content stored in a content management system, you can still open files stored locally on your computer. But did you know that AutoVue actually has - built-in - a great solution for viewing, printing and redlining your data stored on file servers? Using the 'Server protocol' you can point AutoVue directly to a top-level location on any networked file server and provide your users with a link or shortcut to access an interface similar to the sample page shown below. Many customers link to pages just like this one from their internal company intranets. Through this webpage, users can easily search and browse through file server data with a 'click-and-view' interface to find the specific image, document, drawing or model they're looking for. Any markups created on a document will be accessible to everyone else viewing that document and of course real-time collaboration is supported as well. Customers on maintenance can consult the AutoVue Admin guide or My Oracle Support Doc ID 753018.1 for an introduction to the server protocol. Contact your local AutoVue Solutions Consultant for help setting up the sample shown above.

    Read the article

  • SQL SERVER – Step by Step Guide to Beginning Data Quality Services in SQL Server 2012 – Introduction to DQS

    - by pinaldave
    Data Quality Services is a very important concept of SQL Server. I have recently started to explore the same and I am really learning some good concepts. Here are two very important blog posts which one should go over before continuing this blog post. Installing Data Quality Services (DQS) on SQL Server 2012 Connecting Error to Data Quality Services (DQS) on SQL Server 2012 This article is introduction to Data Quality Services for beginners. We will be using an Excel file Click on the image to enlarge the it. In the first article we learned to install DQS. In this article we will see how we can learn about building Knowledge Base and using it to help us identify the quality of the data as well help correct the bad quality of the data. Here are the two very important steps we will be learning in this tutorial. Building a New Knowledge Base  Creating a New Data Quality Project Let us start the building the Knowledge Base. Click on New Knowledge Base. In our project we will be using the Excel as a knowledge base. Here is the Excel which we will be using. There are two columns. One is Colors and another is Shade. They are independent columns and not related to each other. The point which I am trying to show is that in Column A there are unique data and in Column B there are duplicate records. Clicking on New Knowledge Base will bring up the following screen. Enter the name of the new knowledge base. Clicking NEXT will bring up following screen where it will allow to select the EXCE file and it will also let users select the source column. I have selected Colors and Shade both as a source column. Creating a domain is very important. Here you can create a unique domain or domain which is compositely build from Colors and Shade. As this is the first example, I will create unique domain – for Colors I will create domain Colors and for Shade I will create domain Shade. Here is the screen which will demonstrate how the screen will look after creating domains. Clicking NEXT it will bring you to following screen where you can do the data discovery. Clicking on the START will start the processing of the source data provided. Pre-processed data will show various information related to the source data. In our case it shows that Colors column have unique data whereas Shade have non-unique data and unique data rows are only two. In the next screen you can actually add more rows as well see the frequency of the data as the values are listed unique. Clicking next will publish the knowledge base which is just created. Now the knowledge base is created. We will try to take any random data and attempt to do DQS implementation over it. I am using another excel sheet here for simplicity purpose. In reality you can easily use SQL Server table for the same. Click on New Data Quality Project to see start DQS Project. In the next screen it will ask which knowledge base to use. We will be using our Colors knowledge base which we have recently created. In the Colors knowledge base we had two columns – 1) Colors and 2) Shade. In our case we will be using both of the mappings here. User can select one or multiple column mapping over here. Now the most important phase of the complete project. Click on Start and it will make the cleaning process and shows various results. In our case there were two columns to be processed and it completed the task with necessary information. It demonstrated that in Colors columns it has not corrected any value by itself but in Shade value there is a suggestion it has. We can train the DQS to correct values but let us keep that subject for future blog posts. Now click next and keep the domain Colors selected left side. It will demonstrate that there are two incorrect columns which it needs to be corrected. Here is the place where once corrected value will be auto-corrected in future. I manually corrected the value here and clicked on Approve radio buttons. As soon as I click on Approve buttons the rows will be disappeared from this tab and will move to Corrected Tab. If I had rejected tab it would have moved the rows to Invalid tab as well. In this screen you can see how the corrected 2 rows are demonstrated. You can click on Correct tab and see previously validated 6 rows which passed the DQS process. Now let us click on the Shade domain on the left side of the screen. This domain shows very interesting details as there DQS system guessed the correct answer as Dark with the confidence level of 77%. It is quite a high confidence level and manual observation also demonstrate that Dark is the correct answer. I clicked on Approve and the row moved to corrected tab. On the next screen DQS shows the summary of all the activities. It also demonstrates how the correction of the quality of the data was performed. The user can explore their data to a SQL Server Table, CSV file or Excel. The user also has an option to either explore data and all the associated cleansing info or data only. I will select Data only for demonstration purpose. Clicking explore will generate the files. Let us open the generated file. It will look as following and it looks pretty complete and corrected. Well, we have successfully completed DQS Process. The process is indeed very easy. I suggest you try this out yourself and you will find it very easy to learn. In future we will go over advanced concepts. Are you using this feature on your production server? If yes, would you please leave a comment with your environment and business need. It will be indeed interesting to see where it is implemented. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Business Intelligence, Data Warehousing, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Data Quality Services, DQS

    Read the article

  • Consolidate Data in Private Clouds, But Consider Security and Regulatory Issues

    - by Troy Kitch
    The January 13 webcast Security and Compliance for Private Cloud Consolidation will provide attendees with an overview of private cloud computing based on Oracle's Maximum Availability Architecture and how security and regulatory compliance affects implementations. Many organizations are taking advantage of Oracle's Maximum Availability Architecture to drive down the cost of IT by deploying private cloud computing environments that can support downtime and utilization spikes without idle redundancy. With two-thirds of sensitive and regulated data in organizations' databases private cloud database consolidation means organizations must be more concerned than ever about protecting their information and addressing new regulatory challenges. Join us for this webcast to learn about greater risks and increased threats to private cloud data and how Oracle Database Security Solutions can assist in securely consolidating data and meet compliance requirements. Register Now.

    Read the article

  • Validating Data Using Data Annotation Attributes in ASP.NET MVC

    - by bipinjoshi
    The data entered by the end user in various form fields must be validated before it is saved in the database. Developers often use validation HTML helpers provided by ASP.NET MVC to perform the input validations. Additionally, you can also use data annotation attributes from the System.ComponentModel.DataAnnotations namespace to perform validations at the model level. Data annotation attributes are attached to the properties of the model class and enforce some validation criteria. They are capable of performing validation on the server side as well as on the client side. This article discusses the basics of using these attributes in an ASP.NET MVC application.http://www.bipinjoshi.net/articles/0a53f05f-b58c-47b1-a544-f032f5cfca58.aspx       

    Read the article

  • How to Backup and Transfer Opera Settings, Profiles, and Browsing Sessions

    - by Lori Kaufman
    We’ve previously shown you how to backup Firefox profiles using an extension and third-party software and how to backup Google Chrome profiles. If you use Opera, there is a free tool that makes it easy to backup Opera profiles, settings, and even browsing sessions. Opera offers a sync service, called Opera Link, which allows you to sync your bookmarks, personal bar, history, Speed Dial, notes, and search engines with other computers. However, this service does not sync your current browsing sessions and passwords. We found a free tool, called Stu’s Opera Settings Import & Export tool, that allows you to export all your Opera settings, profiles, and browsing sessions to an archive and import it into Opera on the same or another computer. Stu’s Opera Settings Import & Export tool is portable and does not need to be installed. Simply download the .zip file using the link at the end of this article. Double-click the osie.exe file to run the program. 8 Deadly Commands You Should Never Run on Linux 14 Special Google Searches That Show Instant Answers How To Create a Customized Windows 7 Installation Disc With Integrated Updates

    Read the article

  • More Value From Data Using Data Mining Presentation

    Here is a presentation I gave at the SQLBits conference in September which was recorded by Microsoft.  Usually I speak about SSIS but on this particular event I thought people would like to hear something different from me. Microsoft are making a big play for making Data Mining more accessible to everyone and not just boffins.  In this presentation I give an overview of data mining and then do some demonstrations using the excellent Excel Add-Ins available from Microsoft SQL Server 2008 SQL Server 2005 I hope you enjoy this presentation http://go.microsoft.com/?linkid=9633764

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >