Search Results

Search found 59 results on 3 pages for 'hbase'.

Page 2/3 | < Previous Page | 1 2 3  | Next Page >

  • Searches (and general querying) with HBase and/or Cassandra (best practices?)

    - by alexeypro
    I have User model object with quite few fields (properties, if you wish) in it. Say "firstname", "lastname", "city" and "year-of-birth". Each user also gets "unique id". I want to be able to search by them. How do I do that properly? How to do that at all? My understanding (will work for pretty much any key-value storage -- first goes key, then value) u:123456789 = serialized_json_object ("u" as a simple prefix for user's keys, 123456789 is "unique id"). Now, thinking that I want to be able to search by firstname and lastname, I can save in: f:Steve = u:384734807,u:2398248764,u:23276263 f:Alex = u:12324355,u:121324334 so key is "f" - which is prefix for firstnames, and "Steve" is actual firstname. For "u:Steve" we save as value all user id's who are "Steve's". That makes every search very-very easy. Querying by few fields (properties) -- say by firstname (i.e. "Steve") and lastname (i.e. "l:Anything") is still easy - first get list of user ids from "f:Steve", then list from "l:Anything", find crossing user ids, an here you go. Problems (and there are quite a few): Saving, updating, deleting user is a pain. It has to be atomic and consistent operation. Also, if we have size of value limited to some value - then we are in (potential) trouble. And really not of an answer here. Only zipping the list of user ids? Not too cool, though. What id we want to add new field to search by. Eventually. Say by "city". We certainly can do the same way "c:Los Angeles" = ..., "c:Chicago" = ..., but if we didn't foresee all those "search choices" from the very beginning, then we will have to be able to create some night job or something to go by all existing User records and update those "c:CITY" for them... Quite a big job! Problems with locking. User "u:123" updates his name "Alex", and user "u:456" updates his name "Alex". They both have to update "f:Alex" with their id's. That means either we get into overwriting problem, or one update will wait for another (and imaging if there are many of them?!). What's the best way of doing that? Keeping in mind that I want to search by many fields? P.S. Please, the question is about HBase/Cassandra/NoSQL/Key-Value storages. Please please - no advices to use MySQL and "read about" SELECTs; and worry about scaling problems "later". There is a reason why I asked MY question exactly the way I did. :-)

    Read the article

  • How to achieve zero down time

    - by Hiral Lakdavala
    For an application we want to achieve zero database and application down time using Active Active configuration. Our dB is Oracle Following are my questions: How can we achieve active active configuration in Oracle? Will introducing Cassandra/HBase(or any other No SQL dbs) cloud help in zero downtime or it is only for fast retrieval of data in a large db? Any other options? Thanks and Regards, Hiral

    Read the article

  • Pig: Count number of keys in a map

    - by Donald Miner
    I'd like to count the number of keys in a map in Pig. I could write a UDF to do this, but I was hoping there would be an easier way. data = LOAD 'hbase://MARS1' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'A:*', '-loadKey true -caching=100000') AS (id:bytearray, A_map:map[]); In the code above, I want to basically build a histogram of id and how many items in column family A that key has. In hoping, I tried c = FOREACH data GENERATE id, COUNT(A_map); but that unsurprisingly didn't work. Or, perhaps someone can suggest a better way to do this entirely. If I can't figure this out soon I'll just write a Java MapReduce job or a Pig UDF.

    Read the article

  • Column-oriented DBMS and JOIN operations

    - by André
    From some of the research I've done on NoSQL, column-oriented databases (like HBase or Cassandra) seem to solve the problem of costly JOIN operations, but I don't get how this approach solves this problem. Can anyone explain it to me and/or link me to interesting documentation regarding this area? Thanks

    Read the article

  • running Hadoop software on office computers (when they are idle)

    - by Shahbaz
    Is there a project which helps setup a Hadoop cluster on office desktops, when they are idle? I'd like to experiment with Hadoop/MR/hbase but don't have acces to 5-10 computers. The computers at work are idle after hours and are connected to each other through a very high speed connection. What's more, data on these computers stays within our network so there is no privacy issue. In order for this to work I need a fairly light weight monitor running on each machine. When the computer has been idle for X hours, it will join the cluster. If the user logs on, it has to drop out of the cluster and return all CPU/memory back. Does something like this exist?

    Read the article

  • kick off a map reduce job from my java/mysql webapp

    - by Brian
    Hi guys, I need a bit of archecture advice. I have a java based webapp, with a JPA based ORM backed onto a mysql relational database. Now, as part of the application I have a batch job that compares thousands of database records with each other. This job has become too time consuming and needs to be parallelized. I'm looking at using mapreduce and hadoop in order to do this. However, I'm not too sure about how to integrate this into my current architecture. I think the easiest initial solution is to find a way to push data from mysql into hadoop jobs. I have done some initial research on this and found the following relevant information and possibilities: 1) https://issues.apache.org/jira/browse/HADOOP-2536 this gives an interesting overview of some inbuilt JDBC support 2) This article http://architects.dzone.com/articles/tools-moving-sql-database describes some third party tools to move data from mysql to hadoop. To be honest I'm just starting out with learning about hbase and hadoop but I really don't know how to integrate this into my webapp. Any advice is greatly appreciated. cheers, Brian

    Read the article

  • Microsoft Azure s'enrichit de deux nouveaux services pour le NoSQL et la recherche en plein texte sans oublier Apache HBase pour Azure HDInsight

    Microsoft Azure s'enrichit de deux nouveaux services pour le NoSQL et la recherche en plein texte sans oublier Apache HBase pour Azure HDInsight Microsoft a publié sous forme d'aperçu de nouvelles fonctionnalités pour son service cloud Microsoft Azure, il s'agit de deux services : l'un dédié aux bases données et l'autre pour la recherche en plein texte.Le service dédié aux bases de données a été baptisé Azure DocumentDB, il permettra de compléter les bases de données relationnelles avec un service...

    Read the article

  • Difference between Document-oriented-DB and Bigtable clones

    - by chen
    We are looking for a suitable storage engine for our weblog history data. We looked at Bigtable's paper and understand it is suitable to us well. However, I also understand that Document-oriented-DB such as MongoDB seems to provide a little more powerful schema power -- i.e, it can model our data as well. I wonder how nowadays ppl choose a scalable NoSQL DB --- I read enough articles like "we looked at A, B and C, and we decided to use C". But I'd like to see some benchmark number. What I am saying is that if MongoDB and the like can provide same level of performance as Bigtable clones, why don't web companies choose it (preparing to deal with various potentially more complex data problem)? Thanks, By the way, I read an article (which convinced me at the moment) saying Cassandra does not fit the M/R operation, any comments?

    Read the article

  • How to pick random (small) data samples using Map/Reduce?

    - by Andrei Savu
    I want to write a map/reduce job to select a number of random samples from a large dataset based on a row level condition. I want to minimize the number of intermediate keys. Pseudocode: for each row if row matches condition put the row.id in the bucket if the bucket is not already large enough Have you done something like this? Is there any well known algorithm? A sample containing sequential rows is also good enough. Thanks.

    Read the article

  • Reverse and Forward DNS set up correctly but sometimes MapReduce job fails

    - by phodamentals
    Ever since we switched over our cluster to communicate via private interfaces and created a DNS server with correct forward and reverse lookup zones, we get this message before the M/R job runs: ERROR org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - Cannot resolve the host name for /192.168.3.9 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '9.3.168.192.in-addr.arpa' A dig and nslookup both show that the reverse and forward look-ups both get good responses with no errors from within the cluster. Shortly after these messages, the job runs...but every once in awhile we get a NPE: Exception in thread "main" java.lang.NullPointerException INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.net.DNS.reverseDns(DNS.java:93) INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.reverseDNS(TableInputFormatBase.java:219) INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:184) INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1063) INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1080) INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174) INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:992) INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:945) INFO app.insights.search.SearchIndexUpdater - at java.security.AccessController.doPrivileged(Native Method) INFO app.insights.search.SearchIndexUpdater - at javax.security.auth.Subject.doAs(Subject.java:415) INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:945) INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapreduce.Job.submit(Job.java:566) INFO app.insights.search.SearchIndexUpdater - at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596) INFO app.insights.search.SearchIndexUpdater - at app.insights.search.correlator.comments.CommentCorrelator.main(CommentCorrelator.java:72 Does anyone else who has set-up a CDH Hadoop cluster on a private network w/DNS server get this? CDH 4.3.1 with MR1 2.0.0 and HBase 0.94.6

    Read the article

  • Running interrelated methods continously using java

    - by snehalata
    I have written application which downloads data from a website every 10 mins and writes to a file.Then these files are merged into one file and then R program is run on this merged file to perform sentiment analysis and result is stored in hbase. I want the process of merging files,running R and then storing to HBase to run continuosly on the downloaded data. For running R,we are running R script from java program.We have used Runtime.getRuntime().exec() method to run R program but it doesn't wait for R program to complete and method in the next line starts executing.Using p.waitFor() did not help . What approach should I use to do merge then run R and finally store results in Hbase?Should I use timer class??

    Read the article

  • Big Data Appliance X4-2 Release Announcement

    - by Jean-Pierre Dijcks
    Today we are announcing the release of the 3rd generation Big Data Appliance. Read the Press Release here. Software Focus The focus for this 3rd generation of Big Data Appliance is: Comprehensive and Open - Big Data Appliance now includes all Cloudera Software, including Back-up and Disaster Recovery (BDR), Search, Impala, Navigator as well as the previously included components (like CDH, HBase and Cloudera Manager) and Oracle NoSQL Database (CE or EE). Lower TCO then DIY Hadoop Systems Simplified Operations while providing an open platform for the organization Comprehensive security including the new Audit Vault and Database Firewall software, Apache Sentry and Kerberos configured out-of-the-box Hardware Update A good place to start is to quickly review the hardware differences (no price changes!). On a per node basis the following is a comparison between old and new (X3-2) hardware: Big Data Appliance X3-2 Big Data Appliance X4-2 CPU 2 x 8-Core Intel® Xeon® E5-2660 (2.2 GHz) 2 x 8-Core Intel® Xeon® E5-2650 V2 (2.6 GHz) Memory 64GB 64GB Disk 12 x 3TB High Capacity SAS 12 x 4TB High Capacity SAS InfiniBand 40Gb/sec 40Gb/sec Ethernet 10Gb/sec 10Gb/sec For all the details on the environmentals and other useful information, review the data sheet for Big Data Appliance X4-2. The larger disks give BDA X4-2 33% more capacity over the previous generation while adding faster CPUs. Memory for BDA is expandable to 512 GB per node and can be done on a per-node basis, for example for NameNodes or for HBase region servers, or for NoSQL Database nodes. Software Details More details in terms of software and the current versions (note BDA follows a three monthly update cycle for Cloudera and other software): Big Data Appliance 2.2 Software Stack Big Data Appliance 2.3 Software Stack Linux Oracle Linux 5.8 with UEK 1 Oracle Linux 6.4 with UEK 2 JDK JDK 6 JDK 7 Cloudera CDH CDH 4.3 CDH 4.4 Cloudera Manager CM 4.6 CM 4.7 And like we said at the beginning it is important to understand that all other Cloudera components are now included in the price of Oracle Big Data Appliance. They are fully supported by Oracle and available for all BDA customers. For more information: Big Data Appliance Data Sheet Big Data Connectors Data Sheet Oracle NoSQL Database Data Sheet (CE | EE) Oracle Advanced Analytics Data Sheet

    Read the article

  • Cheatsheet: 2010 04.01 ~ 04.07

    - by gOODiDEA
    Web Web Performance Best Practices: How masters.com re-designed their site to boost performance – and what that re-design missed What’s wrong with extending the DOM John Resig on Advanced Javascript to Improve your Web App .NET Hammock for REST - a REST library for .NET Programming Windows Phone 7 Series by Charlez Petzold – Free EBook Testing the Lock-Free Queue Some Last-Minute New C# 4.0 Features - while (x --> 0) { Console.WriteLine("x = {0}", x); } Better Coding with Visual Studio 2010 Revisiting Asynchronous ASP.NET Pages Database Understanding RAID for SQL Server – Part 2 Cassandra Jump Start For The Windows Developer Cassandra Internals – Writing - Cassandra Write Operation Performance Explained Cassandra Internals – Reading - Cassandra Reads Performance Explained MongoDB Growing Up: Release 1.4 and Commercial Support by 10gen Why NoSQL Will Not Die How Many Hard Drives Do I Need to Support SQL Server? Other Presentation: CouchDB and Lucene MongoDB Cacti Graphs HBase vs Cassandra: why we moved How to use the DedicatedDumpFile registry value to overcome space limitations on the system drive when capturing a system memory dump

    Read the article

  • Big Data – Various Learning Resources – How to Start with Big Data? – Day 20 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned how to become a Data Scientist for Big Data. In this article we will go over various learning resources related to Big Data. In this series we have covered many of the most essential details about Big Data. At the beginning of this series, I have encouraged readers to send me questions. One of the most popular questions is - “I want to learn more about Big Data. Where can I learn it?” This is indeed a great question as there are plenty of resources out to learn about Big Data and it is indeed difficult to select on one resource to learn Big Data. Hence I decided to write here a few of the very important resources which are related to Big Data. Learn from Pluralsight Pluralsight is a global leader in high-quality online training for hardcore developers.  It has fantastic Big Data Courses and I started to learn about Big Data with the help of Pluralsight. Here are few of the courses which are directly related to Big Data. Big Data: The Big Picture Big Data Analytics with Tableau NoSQL: The Big Picture Understanding NoSQL Data Analysis Fundamentals with Tableau I encourage all of you start with this video course as they are fantastic fundamentals to learn Big Data. Learn from Apache Resources at Apache are single point the most authentic learning resources. If you want to learn fundamentals and go deep about every aspect of the Big Data, I believe you must understand various concepts in Apache’s library. I am pretty impressed with the documentation and I am personally referencing it every single day when I work with Big Data. I strongly encourage all of you to bookmark following all the links for authentic big data learning. Haddop - The Apache Hadoop® project develops open-source software for reliable, scalable, distributed computing. Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which include support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heat maps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. Avro: A data serialization system. Cassandra: A scalable multi-master database with no single points of failure. Chukwa: A data collection system for managing large distributed systems. HBase: A scalable, distributed database that supports structured data storage for large tables. Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying. Mahout: A Scalable machine learning and data mining library. Pig: A high-level data-flow language and execution framework for parallel computation. ZooKeeper: A high-performance coordination service for distributed applications. Learn from Vendors One of the biggest issues with about learning Big Data is setting up the environment. Every Big Data vendor has different environment request and there are lots of things require to set up Big Data framework. Many of the users do not start with Big Data as they are afraid about the resources required to set up framework as well as a time commitment. Here Hortonworks have created fantastic learning environment. They have created Sandbox with everything one person needs to learn Big Data and also have provided excellent tutoring along with it. Sandbox comes with a dozen hands-on tutorial that will guide you through the basics of Hadoop as well it contains the Hortonworks Data Platform. I think Hortonworks did a fantastic job building this Sandbox and Tutorial. Though there are plenty of different Big Data Vendors I have decided to list only Hortonworks due to their unique setup. Please leave a comment if there are any other such platform to learn Big Data. I will include them over here as well. Learn from Books There are indeed few good books out there which one can refer to learn Big Data. Here are few good books which I have read. I will update the list as I will learn more. Ethics of Big Data Balancing Risk and Innovation Big Data for Dummies Head First Data Analysis: A Learner’s Guide to Big Numbers, Statistics, and Good Decisions If you search on Amazon there are millions of the books but I think above three books are a great set of books and it will give you great ideas about Big Data. Once you go through above books, you will have a clear idea about what is the next step you should follow in this series. You will be capable enough to make the right decision for yourself. Tomorrow In tomorrow’s blog post we will wrap up this series of Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Scalable distributed file system for blobs like images and other documents

    - by Pinnacle
    Cassandra & HBase both do not efficiently support storage of blobs like images. Storing directly on HDFS stresses the Namenode because of huge number of files. Facebook uses Haystack for images and attachments storage, but this is not open source. So is Lustre a good choice for distributed blob storage? I have read that Amazon S3 is used by many, but this would cost money and personally, I would not like to rely on third party system. What are other suggestions?

    Read the article

  • Is OpenStack suitable as a fault tolerant DB host?

    - by Jit B
    I am trying to design a fault tolerant DB cluster (schema does not matter) that would not require much maintenance. After looking at almost everything from MySQL to MongoDB to HBase I still find that no DB is easily scalable - Cassandra comes close but it has its own set of problems. So I was thinking what if I run something like MySQL or OrientDB on top of a large openstack VM. The VM would be fault tolerant by itself so I dont need to do it st DB level. Is it viable? Has it been done before? If not then what are the possible problems with this approach?

    Read the article

  • Loading from Multiple Data Sources with Oracle Loader for Hadoop

    - by mannamal
    Oracle Loader for Hadoop can be used to load data from multiple data sources (for example Hive, HBase), and data in multiple formats (for example Apache weblogs, JSON files).   There are two ways to do this: (1) Use an input format implementation.  Oracle Loader for Hadoop includes several input format implementations.  In addition, a user can develop their own input format implementation for proprietary data sources and formats. (2) Leverage the capabilities of Hive, and use Oracle Loader for Hadoop to load from Hive. These approaches are discussed in our Oracle Open World 2013 presentation. 

    Read the article

  • What should be the considerations for choosing SQL/NoSQL?

    - by Yuval A
    Target application is a medium-sized website built to support several hundred-thousand users an hour, with an option to scale above that. Data model is rather simple, and caching potential is pretty high (~10:1 ratio of read to edit actions). What should be the considerations when coming to choose between a relational, SQL-based datastore to a NoSQL option (such as HBase and Cassandra)?

    Read the article

  • High performance querying - Sugestions please

    - by Alex Takitani
    Supposing that I have millions of user profiles, with hundreds of fields (name, gender, preferred pet and so on...). With database would You choose? Suppose that You have a Facebook like load. Speed is a must. Open Source preferred. I've read a lot about Cassandra, HBase, Mongo, Mysql... I just can't decide.....

    Read the article

  • Suggestions for open source testing tool for cloud computing

    - by vikraman
    Hi, I want to know if there is any open source testing tool for cloud computing. We have built a cloud framework with Xen, Eucalyptus, Hadoop, HBase as different layers. I am not looking at testing each of these tools separately, but i want to test them from the perspective of fitting into a cloud environment (for example scalability of xen hypervisor to handle multiple VMs). Would be great if you can suggest me some tool (open source) for the above.

    Read the article

  • Configuring memcached for a particular scenario

    - by pradeepchhetri
    I have a web application which queries opentsdb server(which in backend using Hbase cluster) for the datapoints of different metrics and using dygraph javascript graphing library, I am plotting those metrics. Since getting all the datapoints of past one day from opentsdb for a particular metric is itself taking nearly 2 seconds, my application which is plotting nearly 25 metrics is becoming very slow. In order to reduce this latency, I am thinking of using memcached module of php5 for caching all the queries. But I have few questions regarding memcached. Is there any way I can configure memcache to keep on updating its cache in the background by running some command line queries after particular interval of time. Is there any way I can configure memcache to always reply for a query using cache instead of first updating its cache because my application just plots datapoints for past one day. Missing out some datapoints is not that critical.

    Read the article

  • Talking JavaOne with Rock Star Raghavan Srinivas

    - by Janice J. Heiss
    Raghavan Srinivas, affectionately known as “Rags,” is a two-time JavaOne Rock Star (from 2005 and 2011) who, as a Developer Advocate at Couchbase, gets his hands dirty with emerging technology directions and trends. His general focus is on distributed systems, with a specialization in cloud computing. He worked on Hadoop and HBase during its early stages, has spoken at conferences world-wide on a variety of technical topics, conducted and organized Hands-on Labs and taught graduate classes.He has 20 years of hands-on software development and over 10 years of architecture and technology evangelism experience and has worked for Digital Equipment Corporation, Sun Microsystems, Intuit and Accenture. He has evangelized and influenced the architecture of numerous technologies including the early releases of JavaFX, Java, Java EE, Java and XML, Java ME, AJAX and Web 2.0, and Java Security.Rags will be giving these sessions at JavaOne 2012: CON3570 -- Autosharding Enterprise to Social Gaming Applications with NoSQL and Couchbase CON3257 -- Script Bowl 2012: The Battle of the JVM-Based Languages (with Guillaume Laforge, Aaron Bedra, Dick Wall, and Dr Nic Williams) Rags emphasized the importance of the Cloud: “The Cloud and the Big Data are popular technologies not merely because they are trendy, but, largely due to the fact that it's possible to do massive data mining and use that information for business advantage,” he explained. I asked him what we should know about Hadoop. “Hadoop,” he remarked, “is mainly about using commodity hardware and achieving unprecedented scalability. At the heart of all this is the Java Virtual Machine which is running on each of these nodes. The vision of taking the processing to where the data resides is made possible by Java and Hadoop.” And the most exciting thing happening in the world of Java today? “I read recently that Java projects on github.com are just off the charts when compared to other projects. It's exciting to realize the robust growth of Java and the degree of collaboration amongst Java programmers.” He encourages Java developers to take advantage of Java 7 for Mac OS X which is now available for download. At the same time, he also encourages us to read the caveats. Originally published on blogs.oracle.com/javaone.

    Read the article

  • Talking JavaOne with Rock Star Raghavan Srinivas

    - by Janice J. Heiss
    Raghavan Srinivas, affectionately known as “Rags,” is a two-time JavaOne Rock Star (from 2005 and 2011) who, as a Developer Advocate at Couchbase, gets his hands dirty with emerging technology directions and trends. His general focus is on distributed systems, with a specialization in cloud computing. He worked on Hadoop and HBase during its early stages, has spoken at conferences world-wide on a variety of technical topics, conducted and organized Hands-on Labs and taught graduate classes.He has 20 years of hands-on software development and over 10 years of architecture and technology evangelism experience and has worked for Digital Equipment Corporation, Sun Microsystems, Intuit and Accenture. He has evangelized and influenced the architecture of numerous technologies including the early releases of JavaFX, Java, Java EE, Java and XML, Java ME, AJAX and Web 2.0, and Java Security.Rags will be giving these sessions at JavaOne 2012: CON3570 -- Autosharding Enterprise to Social Gaming Applications with NoSQL and Couchbase CON3257 -- Script Bowl 2012: The Battle of the JVM-Based Languages (with Guillaume Laforge, Aaron Bedra, Dick Wall, and Dr Nic Williams) Rags emphasized the importance of the Cloud: “The Cloud and the Big Data are popular technologies not merely because they are trendy, but, largely due to the fact that it's possible to do massive data mining and use that information for business advantage,” he explained. I asked him what we should know about Hadoop. “Hadoop,” he remarked, “is mainly about using commodity hardware and achieving unprecedented scalability. At the heart of all this is the Java Virtual Machine which is running on each of these nodes. The vision of taking the processing to where the data resides is made possible by Java and Hadoop.” And the most exciting thing happening in the world of Java today? “I read recently that Java projects on github.com are just off the charts when compared to other projects. It's exciting to realize the robust growth of Java and the degree of collaboration amongst Java programmers.” He encourages Java developers to take advantage of Java 7 for Mac OS X which is now available for download. At the same time, he also encourages us to read the caveats.

    Read the article

< Previous Page | 1 2 3  | Next Page >