Search Results

Search found 61623 results on 2465 pages for 'data storage'.

Page 3/2465 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Storage Configuration

- by jchang

Storage performance is not inherently complicated subject. The concepts are relatively simple. In fact, scaling storage performance is far easier compared with the difficulties encounters in scaling processor performance in NUMA systems. Storage performance is achieved by properly distributing IO over: 1) multiple independent PCI-E ports (system memory and IO bandwith is key) 2) multiple RAID controllers or host bus adapters (HBAs) 3) multiple storage IO channels (SAS or FC, complete path) most importantly,...(read more)

Read the article
Storage Configuration

- by jchang

Storage performance is not inherently complicated subject. The concepts are relatively simple. In fact, scaling storage performance is far easier compared with the difficulties encounters in scaling processor performance in NUMA systems. Storage performance is achieved by properly distributing IO over: 1) multiple independent PCI-E ports (system memory and IO bandwith is key) 2) multiple RAID controllers or host bus adapters (HBAs) 3) multiple storage IO channels (SAS or FC, complete path) most importantly,...(read more)

Read the article
The History of Digital Storage [Infographic]

- by Jason Fitzpatrick

From punch cards to hard drives to cloud based storage, how we stash our data away has changed quite a bit in the last century. Courtesy of Mashable, we have an infographic detailing the evolution of storage and comparing storage size, speed, and prices over the decades. Hit up the link below for a higher resolution image. The History of Digital Storage [Mashable] How To Create a Customized Windows 7 Installation Disc With Integrated Updates How to Get Pro Features in Windows Home Versions with Third Party Tools HTG Explains: Is ReadyBoost Worth Using?

Read the article
What is the definition of "Big Data"?

- by Ben

Is there one? All the definitions I can find describe the size, complexity / variety or velocity of the data. Wikipedia's definition is the only one I've found with an actual number Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set. However, this seemingly contradicts the MIKE2.0 definition, referenced in the next paragraph, which indicates that "big" data can be small and that 100,000 sensors on an aircraft creating only 3GB of data could be considered big. IBM despite saying that: Big data is more simply than a matter of size. have emphasised size in their definition. O'Reilly has stressed "volume, velocity and variety" as well. Though explained well, and in more depth, the definition seems to be a re-hash of the others - or vice-versa of course. I think that a Computer Weekly article title sums up a number of articles fairly well "What is big data and how can it be used to gain competitive advantage". But ZDNet wins with the following from 2012: “Big Data” is a catch phrase that has been bubbling up from the high performance computing niche of the IT market... If one sits through the presentations from ten suppliers of technology, fifteen or so different definitions are likely to come forward. Each definition, of course, tends to support the need for that supplier’s products and services. Imagine that. Basically "big data" is "big" in some way shape or form. What is "big"? Is it quantifiable at the current time? If "big" is unquantifiable is there a definition that does not rely solely on generalities?

Read the article
Big Data – Operational Databases Supporting Big Data – Columnar, Graph and Spatial Database – Day 14 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Key-Value Pair Databases and Document Databases in the Big Data Story. In this article we will understand the role of Columnar, Graph and Spatial Database supporting Big Data Story. Now we will see a few of the examples of the operational databases. Relational Databases (The day before yesterday’s post) NoSQL Databases (The day before yesterday’s post) Key-Value Pair Databases (Yesterday’s post) Document Databases (Yesterday’s post) Columnar Databases (Tomorrow’s post) Graph Databases (Today’s post) Spatial Databases (Today’s post) Columnar Databases Relational Database is a row store database or a row oriented database. Columnar databases are column oriented or column store databases. As we discussed earlier in Big Data we have different kinds of data and we need to store different kinds of data in the database. When we have columnar database it is very easy to do so as we can just add a new column to the columnar database. HBase is one of the most popular columnar databases. It uses Hadoop file system and MapReduce for its core data storage. However, remember this is not a good solution for every application. This is particularly good for the database where there is high volume incremental data is gathered and processed. Graph Databases For a highly interconnected data it is suitable to use Graph Database. This database has node relationship structure. Nodes and relationships contain a Key Value Pair where data is stored. The major advantage of this database is that it supports faster navigation among various relationships. For example, Facebook uses a graph database to list and demonstrate various relationships between users. Neo4J is one of the most popular open source graph database. One of the major dis-advantage of the Graph Database is that it is not possible to self-reference (self joins in the RDBMS terms) and there might be real world scenarios where this might be required and graph database does not support it. Spatial Databases We all use Foursquare, Google+ as well Facebook Check-ins for location aware check-ins. All the location aware applications figure out the position of the phone with the help of Global Positioning System (GPS). Think about it, so many different users at different location in the world and checking-in all together. Additionally, the applications now feature reach and users are demanding more and more information from them, for example like movies, coffee shop or places see. They are all running with the help of Spatial Databases. Spatial data are standardize by the Open Geospatial Consortium known as OGC. Spatial data helps answering many interesting questions like “Distance between two locations, area of interesting places etc.” When we think of it, it is very clear that handing spatial data and returning meaningful result is one big task when there are millions of users moving dynamically from one place to another place & requesting various spatial information. PostGIS/OpenGIS suite is very popular spatial database. It runs as a layer implementation on the RDBMS PostgreSQL. This makes it totally unique as it offers best from both the worlds. Courtesy: mushroom network Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Hive. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
The Best Articles for Backing Up and Syncing Your Data

- by Lori Kaufman

World Backup Day is March 31st and we decided to provide you with some useful information to make backing up your data easier. We’ve published articles about backing up various types of data and settings both offline and online. There’s all kinds of settings on your computer to backup in addition to your personal data, such as Wi-Fi passwords, drivers, and settings for programs like web browsers, Office, and Windows Live Writer. There are also many tools available to help you keep your data and settings backed up. Make Your Own Windows 8 Start Button with Zero Memory Usage Reader Request: How To Repair Blurry Photos HTG Explains: What Can You Find in an Email Header?

Read the article
Can't save data for a member in a data form

- by RahulS

Implied sharing is an old thing everyone knows the reasons and solutions of that, still little theory about that: With Essbase implied sharing, some members are shared even if you do not explicitly set them as shared. These members are implied shared members. When an implied share relationship is created, each implied member assumes the other member’s value. Essbase assumes (or implies) a shared member relationship in these situations: 1. A parent has only one child 2. A parent has only one child that consolidates to the parent In a Planning form that contains members with an implied sharing relationship, when a value is added for the parent, the child assumes the same value after the form is saved. Likewise, if a value is added for the child, the parent usually assumes the same value after a form is saved.For example, when a calculation script or load rule populates an implied share member, the other implied share member assumes the value of the member populated by the calculation script or load rule. The last value calculated or imported takes precedence. The result is the same whether you refer to the parent or the child as a variable in a calculation script. For more information have a look at: http://docs.oracle.com/cd/E17236_01/epm.1112/hp_admin_11122/ch14s11.html Now the issue which we are going to talk about is We loose data on save even when the parent is dynamic calc and has a single child. A dynamic calc parent to a single child: If we design the form with following selection: In the data form we will find parent below the member and this is by design whenever you make a selection using commands to select all the member below parent, always children will appear before the parent: Lets try to enter data, Save it Now, try to change the way we selected members Here we go: Now the question again why this behavior: 1. Data from Planning data form passes to Essbase row by row, 2. Because in data form the child member appears before the parent, 3. First, data goes to Essbase for child (SingleStoreChild), 4. Then when Planning passes the data for parent there was #Missing or No data, 5. Over writes the data to #missing. PS: As we know that dynamic calc members are calculated on the fly they are not allocated with any memory in the Essbase, here the parent was dynamic calc and it was pointing to same memory as child in the background, when Planning was passing data to Essbase for second row it has updated the child with missing data.(Little confusing, let me know if you need more explanation) 6. As one of the solutions just change the order of appearance of parent and child. Cheers..!!! Rahul S. https://www.facebook.com/pages/HyperionPlanning/117320818374228

Read the article
Online Storage and security concerns

- by Megge

I plan to set up a small fileserver. I already own a small server at HostEurope (VirtualServer L, 250GB space), but they don't offer enough space (there is the HostEurope Cloud, but paying for bandwidth isn't an option here, video-streaming should be possible) Requirements summarized: Storage: 2TB, Users: ~15, Filesizes: < 100GB, should be easily reachable (Mount as a networkdrive or at least have solid "communication" software) My first question would be: Where can I get halfway affordable online storages? And how should I connect them to my server? Getting an additional server is a bit overkill, as I know no hoster which allows 2 TB on a small 2 Ghz Dual Core 2 GB RAM thingy (that would be enough by far, I just need much space), and connecting it via NFS or FTP over Internet seems a bit strange and cripples performance. Do you have any advice where I could get that storage service from? (I sent HostEurope a custom request today, but they didn't answer till now. If they can provide me with that space, this question will be irrelevant, but the 2nd one is the more important one anway, don't do much more than recommend me some based on experience, you don't have to crawl hours through hosting services) livedrive for example offers 5 TB for 17€ / month, I'd be happy with 2 TB for 20 €, the caveat is: It doesn't allow multiple users, which leads me to my second question: Where are the security problems? Which protocol is sufficient (I want private and "public" folders etc. the usual "every user has its own and a public space"-thing), secure and fast? (I'd tend to (S)FTP, problem with FTP is: Most of those hosting services don't even allow FTP with mutliple users and single users lead me into "hacking" a solution (you could map the basic folder structure on the main server and just mount every subfolder from the storage, things get difficult with a public folder with 644 permissions though) Is useing something like PKI or 802.1X overkill for private uses?

Read the article
Enterprise Manager 12c ? ZFS Storage Appliance

- by user13138569

?????????????? Enterprise Manager 12c ??? Sun ZFS Storage Appliance ????????????????????? ???Enterprise Manager ?? Sun ZFS Storage Appliance ?????????????? Enterprise Manager ????????????????? 3??? Sun ZFS Stoarage Appliance ??????????????????? My Oracle Support ???Oracle Technology Network ???????????????????????????? ?????????????????????????? Oracle ZFS Storage Appliance Plugin Downloads Sun ZFS Storage Appliance ????????????????????????????? P.3 ???????????Appliance ???????????? Workflow ?????????? Enterprise Manager ???????????? P.10 ???????????????????????????????????????????Enterprise Manager 11g ??????????????????????? ??????????????????????????? ??????????????????????Sun ZFS Storage Appliance ??????????Database ???????????????????????????????Enterprise Manager ???????????????????????

Read the article
Is Data Science “Science”?

- by BuckWoody

I hold the term “science” in very high esteem. I grew up on the Space Coast in Florida, and eventually worked at the Kennedy Space Center, surrounded by very intelligent people who worked in various scientific fields. Recently a new term has entered the computing dialog – “Data Scientist”. Since it’s not a standard term, it has a lot of definitions, and in fact has been disputed as a correct term. After all, the reasoning goes, if there’s no such thing as “Data Science” then how can there be a Data Scientist? This argument has been made before, albeit with a different term – “Computer Science”. In Peter Denning’s excellent article “Is Computer Science Science” (April 2005/Vol. 48, No. 4 COMMUNICATIONS OF THE ACM) there are many points that separate “science” from “engineering” and even “art”. I won’t repeat the content of that article here (I recommend you read it on your own) but will leverage the points he makes there. Definition of Science To ask the question “is data science ‘science’” then we need to start with a definition of terms. Various references put the definition into the same basic areas: Study of the physical world Systematic and/or disciplined study of a subject area ...and then they include the things studied, the bodies of knowledge and so on. The word itself comes from Latin, and means merely “to know” or “to study to know”. Greek divides knowledge further into “truth” (episteme), and practical use or effects (tekhne). Normally computing falls into the second realm. Definition of Data Science And now a more controversial definition: Data Science. This term is so new and perhaps so niche that the major dictionaries haven’t yet picked it up (my OED reference is older – can’t afford to pop for the online registration at present). Researching the term's general use I created an amalgam of the definitions this way: “Studying and applying mathematical and other techniques to derive information from complex data sets.” Using this definition, data science certainly seems to be science - it's learning about and studying some object or area using systematic methods. But implicit within the definition is the word “application”, which makes the process more akin to engineering or even technology than science. In fact, I find that using these techniques – and data itself – part of science, not science itself. I leave out the concept of studying data patterns or algorithms as part of this discipline. That is actually a domain I see within research, mathematics or computer science. That of course is a type of science, but does not seek for practical applications. As part of the argument against calling it “Data Science”, some point to the scientific method of creating a hypothesis, testing with controls, testing results against the hypothesis, and documenting for repeatability. These are not steps that we often take in working with data. We normally start with a question, and fit patterns and algorithms to predict outcomes and find correlations. In this way Data Science is more akin to statistics (and in fact makes heavy use of them) in the process rather than starting with an assumption and following on with it. So, is Data Science “Science”? I’m uncertain – and I’m uncertain it matters. Even if we are facing rampant “title inflation” these days (does anyone introduce themselves as a secretary or supervisor anymore?) I can tolerate the term at least from the intent that we use data to study problems across a wide spectrum, rather than restricting it to a single domain. And I also understand those who have worked hard to achieve the very honorable title of “scientist” who have issues with those who borrow the term without asking. What do you think? Science, or not? Does it matter?

Read the article
Data Modeling Resources

- by Dejan Sarka

You can find many different data modeling resources. It is impossible to list all of them. I selected only the most valuable ones for me, and, of course, the ones I contributed to. Books Chris J. Date: An Introduction to Database Systems – IMO a “must” to understand the relational model correctly. Terry Halpin, Tony Morgan: Information Modeling and Relational Databases – meet the object-role modeling leaders. Chris J. Date, Nikos Lorentzos and Hugh Darwen: Time and Relational Theory, Second Edition: Temporal Databases in the Relational Model and SQL – all theory needed to manage temporal data. Louis Davidson, Jessica M. Moss: Pro SQL Server 2012 Relational Database Design and Implementation – the best SQL Server focused data modeling book I know by two of my friends. Dejan Sarka, et al.: MCITP Self-Paced Training Kit (Exam 70-441): Designing Database Solutions by Using Microsoft® SQL Server™ 2005 – SQL Server 2005 data modeling training kit. Most of the text is still valid for SQL Server 2008, 2008 R2, 2012 and 2014. Itzik Ben-Gan, Lubor Kollar, Dejan Sarka, Steve Kass: Inside Microsoft SQL Server 2008 T-SQL Querying – Steve wrote a chapter with mathematical background, and I added a chapter with theoretical introduction to the relational model. Itzik Ben-Gan, Dejan Sarka, Roger Wolter, Greg Low, Ed Katibah, Isaac Kunen: Inside Microsoft SQL Server 2008 T-SQL Programming – I added three chapters with theoretical introduction and practical solutions for the user-defined data types, dynamic schema and temporal data. Dejan Sarka, Matija Lah, Grega Jerkic: Training Kit (Exam 70-463): Implementing a Data Warehouse with Microsoft SQL Server 2012 – my first two chapters are about data warehouse design and implementation. Courses Data Modeling Essentials – I wrote a 3-day course for SolidQ. If you are interested in this course, which I could also deliver in a shorter seminar way, you can contact your closes SolidQ subsidiary, or, of course, me directly on addresses [email protected] or [email protected]. This course could also complement the existing courseware portfolio of training providers, which are welcome to contact me as well. Logical and Physical Modeling for Analytical Applications – online course I wrote for Pluralsight. Working with Temporal data in SQL Server – my latest Pluralsight course, where besides theory and implementation I introduce many original ways how to optimize temporal queries. Forthcoming presentations SQL Bits 12, July 17th – 19th, Telford, UK – I have a full-day pre-conference seminar Advanced Data Modeling Topics there.

Read the article
Deploying Data Mining Models using Model Export and Import

- by [email protected]

In this post, we'll take a look at how Oracle Data Mining facilitates model deployment. After building and testing models, a next step is often putting your data mining model into a production system -- referred to as model deployment. The ability to move data mining model(s) easily into a production system can greatly speed model deployment, and reduce the overall cost. Since Oracle Data Mining provides models as first class database objects, models can be manipulated using familiar database techniques and technology. For example, one or more models can be exported to a flat file, similar to a database table dump file (.dmp). This file can be moved to a different instance of Oracle Database EE, and then imported. All methods for exporting and importing models are based on Oracle Data Pump technology and found in the DBMS_DATA_MINING package. Before performing the actual export or import, a directory object must be created. A directory object is a logical name in the database for a physical directory on the host computer. Read/write access to a directory object is necessary to access the host computer file system from within Oracle Database. For our example, we'll work in the DMUSER schema. First, DMUSER requires the privilege to create any directory. This is often granted through the sysdba account. grant create any directory to dmuser; Now, DMUSER can create the directory object specifying the path where the exported model file (.dmp) should be placed. In this case, on a linux machine, we have the directory /scratch/oracle. CREATE OR REPLACE DIRECTORY dmdir AS '/scratch/oracle'; If you aren't sure of the exact name of the model or models to export, you can find the list of models using the following query: select model_name from user_mining_models; There are several options when exporting models. We can export a single model, multiple models, or all models in a schema using the following procedure calls: BEGIN DBMS_DATA_MINING.EXPORT_MODEL ('MY_MODEL.dmp','dmdir','name =''MY_DT_MODEL'''); END; BEGIN DBMS_DATA_MINING.EXPORT_MODEL ('MY_MODELS.dmp','dmdir', 'name IN (''MY_DT_MODEL'',''MY_KM_MODEL'')'); END; BEGIN DBMS_DATA_MINING.EXPORT_MODEL ('ALL_DMUSER_MODELS.dmp','dmdir'); END; A .dmp file can be imported into another schema or database using the following procedure call, for example: BEGIN DBMS_DATA_MINING.IMPORT_MODEL('MY_MODELS.dmp', 'dmdir'); END; As with models from any data mining tool, when moving a model from one environment to another, care needs to be taken to ensure the transformations that prepare the data for model building are matched (with appropriate parameters and statistics) in the system where the model is deployed. Oracle Data Mining provides automatic data preparation (ADP) and embedded data preparation (EDP) to reduce, or possibly eliminate, the need to explicitly transport transformations with the model. In the case of ADP, ODM automatically prepares the data and includes the necessary transformations in the model itself. In the case of EDP, users can associate their own transformations with attributes of a model. These transformations are automatically applied when applying the model to data, i.e., scoring. Exporting and importing a model with ADP or EDP results in these transformations being immediately available with the model in the production system.

Read the article
Data, Log and Temp file placement

- by jchang

First especially for all the people with SAN storage, drive letters are of no consequence. What matters is the actual physical disk layout. Forget capacity, pay attention to the number of spindles supporting each RAID group. If the RAID group is shared with other application, make sure there the SLA guarantees read and write latency. One very large company conducted a stress test in the QA environment. The SAN admin carved the LUNs from the same pool of disks as production, but thought he had a really...(read more)

Read the article
Why Oracle Data Integrator for Big Data?

- by Mala Narasimharajan

Big Data is everywhere these days - but what exactly is it? It’s data that comes from a multitude of sources – not only structured data, but unstructured data as well. The sheer volume of data is mindboggling – here are a few examples of big data: climate information collected from sensors, social media information, digital pictures, log files, online video files, medical records or online transaction records. These are just a few examples of what constitutes big data. Embedded in big data is tremendous value and being able to manipulate, load, transform and analyze big data is key to enhancing productivity and competitiveness. The value of big data lies in its propensity for greater in-depth analysis and data segmentation -- in turn giving companies detailed information on product performance, customer preferences and inventory. Furthermore, by being able to store and create more data in digital form, “big data can unlock significant value by making information transparent and usable at much higher frequency." (McKinsey Global Institute, May 2011) Oracle's flagship product for bulk data movement and transformation, Oracle Data Integrator, is a critical component of Oracle’s Big Data strategy. ODI provides automation, bulk loading, and validation and transformation capabilities for Big Data while minimizing the complexities of using Hadoop. Specifically, the advantages of ODI in a Big Data scenario are due to pre-built Knowledge Modules that drive processing in Hadoop. This leverages the graphical UI to load and unload data from Hadoop, perform data validations and create mapping expressions for transformations. The Knowledge Modules provide a key jump-start and eliminate a significant amount of Hadoop development. Using Oracle Data Integrator together with Oracle Big Data Connectors, you can simplify the complexities of mapping, accessing, and loading big data (via NoSQL or HDFS) but also correlating your enterprise data – this correlation may require integrating across heterogeneous and standards-based environments, connecting to Oracle Exadata, or sourcing via a big data platform such as Oracle Big Data Appliance. To learn more about Oracle Data Integration and Big Data, download our resource kit to see the latest in whitepapers, webinars, downloads, and more… or go to our website on www.oracle.com/bigdata

Read the article
General Policies and Procedures for Maintaining the Value of Data Assets

Here is a general list for policies and procedures regarding maintaining the value of data assets. Data Backup Policies and Procedures Backups are very important when dealing with data because there is always the chance of losing data due to faulty hardware or a user activity. So the need for a strategic backup system should be mandatory for all companies. This being said, in the real world some companies that I have worked for do not really have a good data backup plan. Typically when companies tend to take this kind of approach in data backups usually the data is not really recoverable. Unfortunately when companies do not regularly test their backup plans they get a false sense of security because they think that they are covered. However, I can tell you from personal and professional experience that a backup plan/system is never fully implemented until it is regularly tested prior to the time when it actually needs to be used. Disaster Recovery Plan Expanding on Backup Policies and Procedures, a company needs to also have a disaster recovery plan in order to protect its data in case of a catastrophic disaster. Disaster recovery plans typically encompass how to restore all of a company’s data and infrastructure back to a restored operational status. Most Disaster recovery plans also include time estimates on how long each step of the disaster recovery plan should take to be executed. It is important to note that disaster recovery plans are never fully implemented until they have been tested just like backup plans. Disaster recovery plans should be tested regularly so that the business can be confident in not losing any or minimal data due to a catastrophic disaster. Firewall Policies and Content Filters One way companies can protect their data is by using a firewall to separate their internal network from the outside. Firewalls allow for enabling or disabling network access as data passes through it by applying various defined restrictions. Furthermore firewalls can also be used to prevent access from the internal network to the outside by these same factors. Common Firewall Restrictions Destination/Sender IP Address Destination/Sender Host Names Domain Names Network Ports Companies can also desire to restrict what their network user’s view on the internet through things like content filters. Content filters allow a company to track what webpages a person has accessed and can also restrict user’s access based on established rules set up in the content filter. This device and/or software can block access to domains or specific URLs based on a few factors. Common Content Filter Criteria Known malicious sites Specific Page Content Page Content Theme Anti-Virus/Mal-ware Polices Fortunately, most companies utilize antivirus programs on all computers and servers for good reason, virus have been known to do the following: Corrupt/Invalidate Data, Destroy Data, and Steal Data. Anti-Virus applications are a great way to prevent any malicious application from being able to gain access to a company’s data. However, anti-virus programs must be constantly updated because new viruses are always being created, and the anti-virus vendors need to distribute updates to their applications so that they can catch and remove them. Data Validation Policies and Procedures Data validation is very important to ensure that only accurate information is stored. The existence of invalid data can cause major problems when businesses attempt to use data for knowledge based decisions and for performance reporting. Data Scrubbing Policies and Procedures Data scrubbing is valuable to companies in one of two ways. The first can be used to clean data prior to being analyzed for report generation. The second is that it allows companies to remove things like personally Identifiable information from its data prior to transmit it between multiple environments or if the information is sent to an external location. An example of this can be seen with medical records in regards to HIPPA laws that prohibit the storage of specific personal and medical information. Additionally, I have professionally run in to a scenario where the Canadian government does not allow any Canadian’s personal information to be stored on a server not located in Canada. Encryption Practices The use of encryption is very valuable when a company needs to any personal information. This allows users with the appropriated access levels to view or confirm the existence or accuracy of data within a system by either decrypting the information or encrypting a piece of data and comparing it to the stored version. Additionally, if for some unforeseen reason the data got in to the wrong hands then they would have to first decrypt the data before they could even be able to read it. Encryption just adds and additional layer of protection around data itself. Standard Normalization Practices The use of standard data normalization practices is very important when dealing with data because it can prevent allot of potential issues by eliminating the potential for unnecessary data duplication. Issues caused by data duplication include excess use of data storage, increased chance for invalidated data, and over use of data processing. Network and Database Security/Access Policies Every company has some form of network/data access policy even if they have none. These policies help secure data from being seen by inappropriate users along with preventing the data from being updated or deleted by users. In addition, without a good security policy there is a large potential for data to be corrupted by unassuming users or even stolen. Data Storage Policies Data storage polices are very important depending on how they are implemented especially when a company is trying to utilize them in conjunction with other policies like Data Backups. I have worked at companies where all network user folders are constantly backed up, and if a user wanted to ensure the existence of a piece of data in the form of a file then they had to store that file in their network folder. Conversely, I have also worked in places where when a user logs on or off of the network there entire user profile is backed up. Training Policies One of the biggest ways to prevent data loss and ensure that data will remain a company asset is through training. The practice of properly train employees on how to work with in systems that access data is crucial when trying to ensure a company’s data will remain an asset. Users need to be trained on how to manipulate a company’s data in order to perform their tasks to reduce the chances of invalidating data.

Read the article
Drive not able to be added to Storage Spaces

- by Bram Vanroy

I am having difficulties trying to create a storage space. I have four Hard Drives in my computer. Samsung 128Gb SSD x2 Caviar Green 2TB Older 320 Gb drive I want to merge the two last ones. The problem is, that the 2TB drive does not show up in the configuration screen.: I formatted both hard drives so that can't be it. Any help is appreciated. Edit: larger view: http://bramvanroy.be/files/images/storagespaces.jpg

Read the article
What will be the better way for data retrieval on application that needs to handle limited amount of data?

- by Milanix

Just moved this question from Stack Overflow. Since, adding my code snippets itself would make this question really long. Instead, I am pretty interested in knowing a better ways for data retrieval on application that needs to handle limited amount of data which isn't updated regularly. Let's take this example: I am writing an application which gets a schedule as an XML from server. I have written a logic in order to parse XML version and update database only if the version is newer than the local version. Although the update is checked automatically/manually on daily basis based on user preference, the actual version update happens only once per few months or so. Since, this is done by some other authority which doesn't provide API but, rather inform publicly on their changes. The actual XML contains a "(n number of groups)(days in a week) (n number of schedule)" . The group is usually 6 and the number of schedule is usually 2. So basically there would usually be only around 100 strings. Now although I have used SQLite at the moment. I want to know how to make update on database. Should I show progress dialog that the application is updating and exit the app when it's done? Since, my updates are infrequent i don't think this will really harm user experience but, is there any better ways to do it? Because I don't want update to be made when user is searching which is done using database. This will cause an database already open exception. At least I have faced this problem before. Is it better to rather parse XML every time when user wants to view certain things or to use SQLite? Since, I make lots of use of adapter in my app to create lists, will that degrade the performance?

Read the article
Importing Multiple Schemas to a Model in Oracle SQL Developer Data Modeler

- by thatjeffsmith

Your physical data model might stretch across multiple Oracle schemas. Or maybe you just want a single diagram containing tables, views, etc. spanning more than a single user in the database. The process for importing a data dictionary is the same, regardless if you want to suck in objects from one schema, or many schemas. Let’s take a quick look at how to get started with a data dictionary import. I’m using Oracle SQL Developer in this example. The process is nearly identical in Oracle SQL Developer Data Modeler – the only difference being you’ll use the ‘File’ menu to get started versus the ‘File – Data Modeler’ menu in SQL Developer. Remember, the functionality is exactly the same whether you use SQL Developer or SQL Developer Data Modeler when it comes to the data modeling features – you’ll just have a cleaner user interface in SQL Developer Data Modeler. Importing a Data Dictionary to a Model You’ll want to open or create your model first. You can import objects to an existing or new model. The easiest way to get started is to simply open the ‘Browser’ under the View menu. The Browser allows you to navigate your open designs/models You’ll see an ‘Untitled_1′ model by default. I’ve renamed mine to ‘hr_sh_scott_demo.’ Now go back to the File menu, and expand the ‘Data Modeler’ section, and select ‘Import – Data Dictionary.’ This is a fancy way of saying, ‘suck objects out of the database into my model’ Connect! If you haven’t already defined a connection to the database you want to reverse engineer, you’ll need to do that now. I’m going to assume you already have that connection – so select it, and hit the ‘Next’ button. Select the Schema(s) to be imported Select one or more schemas you want to import The schemas selected on this page of the wizard will dictate the lists of tables, views, synonyms, and everything else you can choose from in the next wizard step to import. For brevity, I have selected ALL tables, views, and synonyms from 3 different schemas: HR SCOTT SH Once I hit the ‘Finish’ button in the wizard, SQL Developer will interrogate the database and add the objects to our model. The Big Model and the 3 Little Models I can now see ALL of the objects I just imported in the ‘hr_sh_scott_demo’ relational model in my design tree, and in my relational diagram. Quick Tip: Oracle SQL Developer calls what most folks think of as a ‘Physical Model’ the ‘Relational Model.’ Same difference, mostly. In SQL Developer, a Physical model allows you to define partitioning schemes, advanced storage parameters, and add your PL/SQL code. You can have multiple physical models per relational models. For example I might have a 4 Node RAC in Production that uses partitioning, but in test/dev, only have a single instance with no partitioning. I can have models for both of those physical implementations. The list of tables in my relational model Wouldn’t it be nice if I could segregate the objects based on their schema? Good news, you can! And it’s done by default Several of you might already know where I’m going with this – SUBVIEWS. You can easily create a ‘SubView’ by selecting one or more objects in your model or diagram and add them to a new SubView. SubViews are just mini-models. They contain a subset of objects from the main model. This is very handy when you want to break your model into smaller, more digestible parts. The model information is identical across the model and subviews, so you don’t have to worry about making a change in one place and not having it propagate across your design. SubViews can be used as filters when you create reports and exports as well. So instead of generating a PDF for everything, just show me what’s in my ‘ABC’ subview. But, I don’t want to do any work! Remember, I’m really lazy. More good news – it’s already done by default! The schemas are automatically used to create default SubViews Auto-Navigate to the Object in the Diagram In the subview tree node, right-click on the object you want to navigate to. You can ask to be taken to the main model view or to the SubView location. If you haven’t already opened the SubView in the diagram, it will be automatically opened for you. The SubView diagram only contains the objects from that SubView Your SubView might still be pretty big, many dozens of objects, so don’t forget about the ‘Navigator‘ either! In summary, use the ‘Import’ feature to add existing database objects to your model. If you import from multiple schemas, take advantage of the default schema based SubViews to help you manage your models! Sometimes less is more!

Read the article
New: ZFS Storage Appliance Videos

- by Roxana Babiciu

Check out part one of a new video series for ZFS Storage Appliance. In video #1, you’ll learn about the advantages built into Oracle’s ZS3 Storage Appliance that come from the unique position that Oracle holds in the market. In video #2, you’ll learn how best to monitor large ZS3 installations as well as the use of Enterprise Manager as a complement to dtrace analytics at the ZFS Storage Appliance device level.

Read the article
Sun ZFS Storage Appliance 7120, 7320 ??????

- by user13138569

Sun ZFS Storage Appliance 7320 ????????????????????? 7120 ??????????????? ????????????????????????????? Sun ZFS Storage Appliance 7320 ????? Memory 24GB,48GB,72GB 96GB, 144GB ??????(??) 288TB 432TB ???????????(??) 16 24 Shelf? 4 6 Sun ZFS Storage Appliance 7120 ????? Memory 24GB 48GB ????????????????????????????????? 2011.1.2.1 ??????????

Read the article
sg_map & lsscsi showing old storage version

- by PratapSingh

I am using SUN storage and recently upgraded/refreshed my ISCSI LUN storage. We have replicated old storage to new storage and attached to our servers. I can see at SUN storage side that storage is attached to server and also from server when I run the below command it prints the following output : iscsiadm -m session tcp: [1] 10.1.1.10:3260,2 iqn.86-03.com.sun:02:afsfsf58-c56a-6ba8-a944-addd258687cd The above storage is SUN STORAGE 7420 But when I run sg_map or lsscsi command it prints different version: lsscsi disk SUN Sun Storage 7410 1.0 /dev/sda disk SUN Sun Storage 7410 1.0 /dev/sdb disk SUN Sun Storage 7410 1.0 /dev/sdc disk SUN Sun Storage 7410 1.0 /dev/sdd Output of ls on "/dev/disk/by-path/" ls -1 /dev/disk/by-path/ ip-10.1.1.10:3260-iscsi-iqn.86-03.com.sun:02:afsfsf58-c56a-6ba8-a944-addd258687cd-lun-0 ip-10.1.1.10:3260-iscsi-iqn.86-03.com.sun:02:afsfsf58-c56a-6ba8-a944-addd258687cd-lun-0-part1 ip-10.1.1.10:3260-iscsi-iqn.86-03.com.sun:02:afsfsf58-c56a-6ba8-a944-addd258687cd-lun-18 ip-10.1.1.10:3260-iscsi-iqn.86-03.com.sun:02:afsfsf58-c56a-6ba8-a944-addd258687cd-lun-18-part1 ip-10.1.1.10:3260-iscsi-iqn.86-03.com.sun:02:afsfsf58-c56a-6ba8-a944-addd258687cd-lun-2 ip-10.1.1.10:3260-iscsi-iqn.86-03.com.sun:02:afsfsf58-c56a-6ba8-a944-addd258687cd-lun-2-part1 ip-10.1.1.10:3260-iscsi-iqn.86-03.com.sun:02:afsfsf58-c56a-6ba8-a944-addd258687cd-lun-4 ip-10.1.1.10:3260-iscsi-iqn.86-03.com.sun:02:afsfsf58-c56a-6ba8-a944-addd258687cd-lun-4-part1 ip-10.1.1.10:3260-iscsi-iqn.86-03.com.sun:02:afsfsf58-c56a-6ba8-a944-addd258687cd-lun-6 ip-10.1.1.10:3260-iscsi-iqn.86-03.com.sun:02:afsfsf58-c56a-6ba8-a944-addd258687cd-lun-6-part1 I have rebooted server twice but still I am getting the same output as given above.

Read the article
Deploying Data Mining Models using Model Export and Import, Part 2

- by [email protected]

In my last post, Deploying Data Mining Models using Model Export and Import, we explored using DBMS_DATA_MINING.EXPORT_MODEL and DBMS_DATA_MINING.IMPORT_MODEL to enable moving a model from one system to another. In this post, we'll look at two distributed scenarios that make use of this capability and a tip for easily moving models from one machine to another using only Oracle Database, not an external file transport mechanism, such as FTP. The first scenario, consider a company with geographically distributed business units, each collecting and managing their data locally for the products they sell. Each business unit has in-house data analysts that build models to predict which products to recommend to customers in their space. A central telemarketing business unit also uses these models to score new customers locally using data collected over the phone. Since the models recommend different products, each customer is scored using each model. This is depicted in Figure 1.Figure 1: Target instance importing multiple remote models for local scoring In the second scenario, consider multiple hospitals that collect data on patients with certain types of cancer. The data collection is standardized, so each hospital collects the same patient demographic and other health / tumor data, along with the clinical diagnosis. Instead of each hospital building it's own models, the data is pooled at a central data analysis lab where a predictive model is built. Once completed, the model is distributed to hospitals, clinics, and doctor offices who can score patient data locally.Figure 2: Multiple target instances importing the same model from a source instance for local scoring Since this blog focuses on model export and import, we'll only discuss what is necessary to move a model from one database to another. Here, we use the package DBMS_FILE_TRANSFER, which can move files between Oracle databases. The script is fairly straightforward, but requires setting up a database link and directory objects. We saw how to create directory objects in the previous post. To create a database link to the source database from the target, we can use, for example: create database link SOURCE1_LINK connect to <schema> identified by <password> using 'SOURCE1'; Note that 'SOURCE1' refers to the service name of the remote database entry in your tnsnames.ora file. From SQL*Plus, first connect to the remote database and export the model. Note that the model_file_name does not include the .dmp extension. This is because export_model appends "01" to this name. Next, connect to the local database and invoke DBMS_FILE_TRANSFER.GET_FILE and import the model. Note that "01" is eliminated in the target system file name. connect <source_schema>/<password>@SOURCE1_LINK; BEGIN DBMS_DATA_MINING.EXPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp', 'MY_SOURCE_DIR_OBJECT', 'name =''MY_MINING_MODEL'''); END; connect <target_schema>/<password>; BEGIN DBMS_FILE_TRANSFER.GET_FILE ('MY_SOURCE_DIR_OBJECT', 'EXPORT_FILE_NAME' || '01.dmp', 'SOURCE1_LINK', 'MY_TARGET_DIR_OBJECT', 'EXPORT_FILE_NAME' || '.dmp' ); DBMS_DATA_MINING.IMPORT_MODEL ('EXPORT_FILE_NAME' || '.dmp', 'MY_TARGET_DIR_OBJECT'); END; To clean up afterward, you may want to drop the exported .dmp file at the source and the transferred file at the target. For example, utl_file.fremove('&directory_name', '&model_file_name' || '.dmp');

Read the article
Abstract Data Type and Data Structure

- by mark075

It's quite difficult for me to understand these terms. I searched on google and read a little on Wikipedia but I'm still not sure. I've determined so far that: Abstract Data Type is a definition of new type, describes its properties and operations. Data Structure is an implementation of ADT. Many ADT can be implemented as the same Data Structure. If I think right, array as ADT means a collection of elements and as Data Structure, how it's stored in a memory. Stack is ADT with push, pop operations, but can we say about stack data structure if I mean I used stack implemented as an array in my algorithm? And why heap isn't ADT? It can be implemented as tree or an array.

Read the article
ASP.NET and HTML5 Local Storage

- by Stephen Walther

My favorite feature of HTML5, hands-down, is HTML5 local storage (aka DOM storage). By taking advantage of HTML5 local storage, you can dramatically improve the performance of your data-driven ASP.NET applications by caching data in the browser persistently. Think of HTML5 local storage like browser cookies, but much better. Like cookies, local storage is persistent. When you add something to browser local storage, it remains there when the user returns to the website (possibly days or months later). Importantly, unlike the cookie storage limitation of 4KB, you can store up to 10 megabytes in HTML5 local storage. Because HTML5 local storage works with the latest versions of all modern browsers (IE, Firefox, Chrome, Safari), you can start taking advantage of this HTML5 feature in your applications right now. Why use HTML5 Local Storage? I use HTML5 Local Storage in the JavaScript Reference application: http://Superexpert.com/JavaScriptReference The JavaScript Reference application is an HTML5 app that provides an interactive reference for all of the syntax elements of JavaScript (You can read more about the application and download the source code for the application here). When you open the application for the first time, all of the entries are transferred from the server to the browser (all 300+ entries). All of the entries are stored in local storage. When you open the application in the future, only changes are transferred from the server to the browser. The benefit of this approach is that the application performs extremely fast. When you click the details link to view details on a particular entry, the entry details appear instantly because all of the entries are stored on the client machine. When you perform key-up searches, by typing in the filter textbox, matching entries are displayed very quickly because the entries are being filtered on the local machine. This approach can have a dramatic effect on the performance of any interactive data-driven web application. Interacting with data on the client is almost always faster than interacting with the same data on the server. Retrieving Data from the Server In the JavaScript Reference application, I use Microsoft WCF Data Services to expose data to the browser. WCF Data Services generates a REST interface for your data automatically. Here are the steps: Create your database tables in Microsoft SQL Server. For example, I created a database named ReferenceDB and a database table named Entities. Use the Entity Framework to generate your data model. For example, I used the Entity Framework to generate a class named ReferenceDBEntities and a class named Entities. Expose your data through WCF Data Services. I added a WCF Data Service to my project and modified the data service class to look like this: using System.Data.Services; using System.Data.Services.Common; using System.Web; using JavaScriptReference.Models; namespace JavaScriptReference.Services { [System.ServiceModel.ServiceBehavior(IncludeExceptionDetailInFaults = true)] public class EntryService : DataService<ReferenceDBEntities> { // This method is called only once to initialize service-wide policies. public static void InitializeService(DataServiceConfiguration config) { config.UseVerboseErrors = true; config.SetEntitySetAccessRule("*", EntitySetRights.All); config.DataServiceBehavior.MaxProtocolVersion = DataServiceProtocolVersion.V2; } // Define a change interceptor for the Products entity set. [ChangeInterceptor("Entries")] public void OnChangeEntries(Entry entry, UpdateOperations operations) { if (!HttpContext.Current.Request.IsAuthenticated) { throw new DataServiceException("Cannot update reference unless authenticated."); } } } } The WCF data service is named EntryService. Notice that it derives from DataService<ReferenceEntitites>. Because it derives from DataService<ReferenceEntities>, the data service exposes the contents of the ReferenceEntitiesDB database. In the code above, I defined a ChangeInterceptor to prevent un-authenticated users from making changes to the database. Anyone can retrieve data through the service, but only authenticated users are allowed to make changes. After you expose data through a WCF Data Service, you can use jQuery to retrieve the data by performing an Ajax call. For example, I am using an Ajax call that looks something like this to retrieve the JavaScript entries from the EntryService.svc data service: $.ajax({ dataType: "json", url: “/Services/EntryService.svc/Entries”, success: function (result) { var data = callback(result["d"]); } }); Notice that you must unwrap the data using result[“d”]. After you unwrap the data, you have a JavaScript array of the entries. I’m transferring all 300+ entries from the server to the client when the application is opened for the first time. In other words, I transfer the entire database from the server to the client, once and only once, when the application is opened for the first time. The data is transferred using JSON. Here is a fragment: { "d" : [ { "__metadata": { "uri": "http://superexpert.com/javascriptreference/Services/EntryService.svc/Entries(1)", "type": "ReferenceDBModel.Entry" }, "Id": 1, "Name": "Global", "Browsers": "ff3_6,ie8,ie9,c8,sf5,es3,es5", "Syntax": "object", "ShortDescription": "Contains global variables and functions", "FullDescription": "<p>\nThe Global object is determined by the host environment. In web browsers, the Global object is the same as the windows object.\n</p>\n<p>\nYou can use the keyword <code>this</code> to refer to the Global object when in the global context (outside of any function).\n</p>\n<p>\nThe Global object holds all global variables and functions. For example, the following code demonstrates that the global <code>movieTitle</code> variable refers to the same thing as <code>window.movieTitle</code> and <code>this.movieTitle</code>.\n</p>\n<pre>\nvar movieTitle = \"Star Wars\";\nconsole.log(movieTitle === this.movieTitle); // true\nconsole.log(movieTitle === window.movieTitle); // true\n</pre>\n", "LastUpdated": "634298578273756641", "IsDeleted": false, "OwnerId": null }, { "__metadata": { "uri": "http://superexpert.com/javascriptreference/Services/EntryService.svc/Entries(2)", "type": "ReferenceDBModel.Entry" }, "Id": 2, "Name": "eval(string)", "Browsers": "ff3_6,ie8,ie9,c8,sf5,es3,es5", "Syntax": "function", "ShortDescription": "Evaluates and executes JavaScript code dynamically", "FullDescription": "<p>\nThe following code evaluates and executes the string \"3+5\" at runtime.\n</p>\n<pre>\nvar result = eval(\"3+5\");\nconsole.log(result); // returns 8\n</pre>\n<p>\nYou can rewrite the code above like this:\n</p>\n<pre>\nvar result;\neval(\"result = 3+5\");\nconsole.log(result);\n</pre>", "LastUpdated": "634298580913817644", "IsDeleted": false, "OwnerId": 1 } … ]} I worried about the amount of time that it would take to transfer the records. According to Google Chome, it takes about 5 seconds to retrieve all 300+ records on a broadband connection over the Internet. 5 seconds is a small price to pay to avoid performing any server fetches of the data in the future. And here are the estimated times using different types of connections using Fiddler: Notice that using a modem, it takes 33 seconds to download the database. 33 seconds is a significant chunk of time. So, I would not use the approach of transferring the entire database up front if you expect a significant portion of your website audience to connect to your website with a modem. Adding Data to HTML5 Local Storage After the JavaScript entries are retrieved from the server, the entries are stored in HTML5 local storage. Here’s the reference documentation for HTML5 storage for Internet Explorer: http://msdn.microsoft.com/en-us/library/cc197062(VS.85).aspx You access local storage by accessing the windows.localStorage object in JavaScript. This object contains key/value pairs. For example, you can use the following JavaScript code to add a new item to local storage: <script type="text/javascript"> window.localStorage.setItem("message", "Hello World!"); </script> You can use the Google Chrome Storage tab in the Developer Tools (hit CTRL-SHIFT I in Chrome) to view items added to local storage: After you add an item to local storage, you can read it at any time in the future by using the window.localStorage.getItem() method: <script type="text/javascript"> window.localStorage.setItem("message", "Hello World!"); </script> You only can add strings to local storage and not JavaScript objects such as arrays. Therefore, before adding a JavaScript object to local storage, you need to convert it into a JSON string. In the JavaScript Reference application, I use a wrapper around local storage that looks something like this: function Storage() { this.get = function (name) { return JSON.parse(window.localStorage.getItem(name)); }; this.set = function (name, value) { window.localStorage.setItem(name, JSON.stringify(value)); }; this.clear = function () { window.localStorage.clear(); }; } If you use the wrapper above, then you can add arbitrary JavaScript objects to local storage like this: var store = new Storage(); // Add array to storage var products = [ {name:"Fish", price:2.33}, {name:"Bacon", price:1.33} ]; store.set("products", products); // Retrieve items from storage var products = store.get("products"); Modern browsers support the JSON object natively. If you need the script above to work with older browsers then you should download the JSON2.js library from: https://github.com/douglascrockford/JSON-js The JSON2 library will use the native JSON object if a browser already supports JSON. Merging Server Changes with Browser Local Storage When you first open the JavaScript Reference application, the entire database of JavaScript entries is transferred from the server to the browser. Two items are added to local storage: entries and entriesLastUpdated. The first item contains the entire entries database (a big JSON string of entries). The second item, a timestamp, represents the version of the entries. Whenever you open the JavaScript Reference in the future, the entriesLastUpdated timestamp is passed to the server. Only records that have been deleted, updated, or added since entriesLastUpdated are transferred to the browser. The OData query to get the latest updates looks like this: http://superexpert.com/javascriptreference/Services/EntryService.svc/Entries?$filter=(LastUpdated%20gt%20634301199890494792L) If you remove URL encoding, the query looks like this: http://superexpert.com/javascriptreference/Services/EntryService.svc/Entries?$filter=(LastUpdated gt 634301199890494792L) This query returns only those entries where the value of LastUpdated > 634301199890494792 (the version timestamp). The changes – new JavaScript entries, deleted entries, and updated entries – are merged with the existing entries in local storage. The JavaScript code for performing the merge is contained in the EntriesHelper.js file. The merge() method looks like this: merge: function (oldEntries, newEntries) { // concat (this performs the add) oldEntries = oldEntries || []; var mergedEntries = oldEntries.concat(newEntries); // sort this.sortByIdThenLastUpdated(mergedEntries); // prune duplicates (this performs the update) mergedEntries = this.pruneDuplicates(mergedEntries); // delete mergedEntries = this.removeIsDeleted(mergedEntries); // Sort this.sortByName(mergedEntries); return mergedEntries; }, The contents of local storage are then updated with the merged entries. I spent several hours writing the merge() method (much longer than I expected). I found two resources to be extremely useful. First, I wrote extensive unit tests for the merge() method. I wrote the unit tests using server-side JavaScript. I describe this approach to writing unit tests in this blog entry. The unit tests are included in the JavaScript Reference source code. Second, I found the following blog entry to be super useful (thanks Nick!): http://nicksnettravels.builttoroam.com/post/2010/08/03/OData-Synchronization-with-WCF-Data-Services.aspx One big challenge that I encountered involved timestamps. I originally tried to store an actual UTC time as the value of the entriesLastUpdated item. I quickly discovered that trying to work with dates in JSON turned out to be a big can of worms that I did not want to open. Next, I tried to use a SQL timestamp column. However, I learned that OData cannot handle the timestamp data type when doing a filter query. Therefore, I ended up using a bigint column in SQL and manually creating the value when a record is updated. I overrode the SaveChanges() method to look something like this: public override int SaveChanges(SaveOptions options) { var changes = this.ObjectStateManager.GetObjectStateEntries( EntityState.Modified | EntityState.Added | EntityState.Deleted); foreach (var change in changes) { var entity = change.Entity as IEntityTracking; if (entity != null) { entity.LastUpdated = DateTime.Now.Ticks; } } return base.SaveChanges(options); } Notice that I assign Date.Now.Ticks to the entity.LastUpdated property whenever an entry is modified, added, or deleted. Summary After building the JavaScript Reference application, I am convinced that HTML5 local storage can have a dramatic impact on the performance of any data-driven web application. If you are building a web application that involves extensive interaction with data then I recommend that you take advantage of this new feature included in the HTML5 standard.

Read the article
Big Data – Various Learning Resources – How to Start with Big Data? – Day 20 of 21

- by Pinal Dave

In yesterday’s blog post we learned how to become a Data Scientist for Big Data. In this article we will go over various learning resources related to Big Data. In this series we have covered many of the most essential details about Big Data. At the beginning of this series, I have encouraged readers to send me questions. One of the most popular questions is - “I want to learn more about Big Data. Where can I learn it?” This is indeed a great question as there are plenty of resources out to learn about Big Data and it is indeed difficult to select on one resource to learn Big Data. Hence I decided to write here a few of the very important resources which are related to Big Data. Learn from Pluralsight Pluralsight is a global leader in high-quality online training for hardcore developers. It has fantastic Big Data Courses and I started to learn about Big Data with the help of Pluralsight. Here are few of the courses which are directly related to Big Data. Big Data: The Big Picture Big Data Analytics with Tableau NoSQL: The Big Picture Understanding NoSQL Data Analysis Fundamentals with Tableau I encourage all of you start with this video course as they are fantastic fundamentals to learn Big Data. Learn from Apache Resources at Apache are single point the most authentic learning resources. If you want to learn fundamentals and go deep about every aspect of the Big Data, I believe you must understand various concepts in Apache’s library. I am pretty impressed with the documentation and I am personally referencing it every single day when I work with Big Data. I strongly encourage all of you to bookmark following all the links for authentic big data learning. Haddop - The Apache Hadoop® project develops open-source software for reliable, scalable, distributed computing. Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which include support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heat maps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper: A high-performance coordination service for distributed applications. Learn from Vendors One of the biggest issues with about learning Big Data is setting up the environment. Every Big Data vendor has different environment request and there are lots of things require to set up Big Data framework. Many of the users do not start with Big Data as they are afraid about the resources required to set up framework as well as a time commitment. Here Hortonworks have created fantastic learning environment. They have created Sandbox with everything one person needs to learn Big Data and also have provided excellent tutoring along with it. Sandbox comes with a dozen hands-on tutorial that will guide you through the basics of Hadoop as well it contains the Hortonworks Data Platform. I think Hortonworks did a fantastic job building this Sandbox and Tutorial. Though there are plenty of different Big Data Vendors I have decided to list only Hortonworks due to their unique setup. Please leave a comment if there are any other such platform to learn Big Data. I will include them over here as well. Learn from Books There are indeed few good books out there which one can refer to learn Big Data. Here are few good books which I have read. I will update the list as I will learn more. Ethics of Big Data Balancing Risk and Innovation Big Data for Dummies Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions If you search on Amazon there are millions of the books but I think above three books are a great set of books and it will give you great ideas about Big Data. Once you go through above books, you will have a clear idea about what is the next step you should follow in this series. You will be capable enough to make the right decision for yourself. Tomorrow In tomorrow’s blog post we will wrap up this series of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article

Search Results

Search found 61623 results on 2465 pages for 'data storage'.

Page 3/2465 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by jchang

- by jchang

- by Jason Fitzpatrick

- by Ben

- by Pinal Dave

- by Lori Kaufman

- by RahulS

- by Megge

- by user13138569

- by BuckWoody

- by Dejan Sarka

- by [email protected]

- by jchang

- by Mala Narasimharajan

- by Bram Vanroy

- by Milanix

- by thatjeffsmith

- by Roxana Babiciu

- by user13138569

- by PratapSingh

- by [email protected]

- by mark075

- by Stephen Walther

- by Pinal Dave

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >