Search Results

Search found 196 results on 8 pages for 'mapreduce'.

Page 8/8 | < Previous Page | 4 5 6 7 8

Big Data – Interacting with Hadoop – What is Sqoop? – What is Zookeeper? – Day 17 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Pig and Pig Latin in Big Data Story. In this article we will understand what is Sqoop and Zookeeper in Big Data Story. There are two most important components one should learn when learning about interacting with Hadoop – Sqoop and Zookper. What is Sqoop? Most of the business stores their data in RDBMS as well as other data warehouse solutions. They need a way to move data to the Hadoop system to do various processing and return it back to RDBMS from Hadoop system. The data movement can happen in real time or at various intervals in bulk. We need a tool which can help us move this data from SQL to Hadoop and from Hadoop to SQL. Sqoop (SQL to Hadoop) is such a tool which extract data from non-Hadoop data sources and transform them into the format which Hadoop can use it and later it loads them into HDFS. Essentially it is ETL tool where it Extracts, Transform and Load from SQL to Hadoop. The best part is that it also does extract data from Hadoop and loads them to Non-SQL (or RDBMS) data stores. Essentially, Sqoop is a command line tool which does SQL to Hadoop and Hadoop to SQL. It is a command line interpreter. It creates MapReduce job behinds the scene to import data from an external database to HDFS. It is very effective and easy to learn tool for nonprogrammers. What is Zookeeper? ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. In other words Zookeeper is a replicated synchronization service with eventual consistency. In simpler words – in Hadoop cluster there are many different nodes and one node is master. Let us assume that master node fails due to any reason. In this case, the role of the master node has to be transferred to a different node. The main role of the master node is managing the writers as that task requires persistence in order of writing. In this kind of scenario Zookeeper will assign new master node and make sure that Hadoop cluster performs without any glitch. Zookeeper is the Hadoop’s method of coordinating all the elements of these distributed systems. Here are few of the tasks which Zookeepr is responsible for. Zookeeper manages the entire workflow of starting and stopping various nodes in the Hadoop’s cluster. In Hadoop cluster when any processes need certain configuration to complete the task. Zookeeper makes sure that certain node gets necessary configuration consistently. In case of the master node fails, Zookeepr can assign new master node and make sure cluster works as expected. There many other tasks Zookeeper performance when it is about Hadoop cluster and communication. Basically without the help of Zookeeper it is not possible to design any new fault tolerant distributed application. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Big Data Analytics. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Know your Data Lineage

- by Simon Elliston Ball

An academic paper without the footnotes isn’t an academic paper. Journalists wouldn’t base a news article on facts that they can’t verify. So why would anyone publish reports without being able to say where the data has come from and be confident of its quality, in other words, without knowing its lineage. (sometimes referred to as ‘provenance’ or ‘pedigree’) The number and variety of data sources, both traditional and new, increases inexorably. Data comes clean or dirty, processed or raw, unimpeachable or entirely fabricated. On its journey to our report, from its source, the data can travel through a network of interconnected pipes, passing through numerous distinct systems, each managed by different people. At each point along the pipeline, it can be changed, filtered, aggregated and combined. When the data finally emerges, how can we be sure that it is right? How can we be certain that no part of the data collection was based on incorrect assumptions, that key data points haven’t been left out, or that the sources are good? Even when we’re using data science to give us an approximate or probable answer, we cannot have any confidence in the results without confidence in the data from which it came. You need to know what has been done to your data, where it came from, and who is responsible for each stage of the analysis. This information represents your data lineage; it is your stack-trace. If you’re an analyst, suspicious of a number, it tells you why the number is there and how it got there. If you’re a developer, working on a pipeline, it provides the context you need to track down the bug. If you’re a manager, or an auditor, it lets you know the right things are being done. Lineage tracking is part of good data governance. Most audit and lineage systems require you to buy into their whole structure. If you are using Hadoop for your data storage and processing, then tools like Falcon allow you to track lineage, as long as you are using Falcon to write and run the pipeline. It can mean learning a new way of running your jobs (or using some sort of proxy), and even a distinct way of writing your queries. Other Hadoop tools provide a lot of operational and audit information, spread throughout the many logs produced by Hive, Sqoop, MapReduce and all the various moving parts that make up the eco-system. To get a full picture of what’s going on in your Hadoop system you need to capture both Falcon lineage and the data-exhaust of other tools that Falcon can’t orchestrate. However, the problem is bigger even that that. Often, Hadoop is just one piece in a larger processing workflow. The next step of the challenge is how you bind together the lineage metadata describing what happened before and after Hadoop, where ‘after’ could be a data analysis environment like R, an application, or even directly into an end-user tool such as Tableau or Excel. One possibility is to push as much as you can of your key analytics into Hadoop, but would you give up the power, and familiarity of your existing tools in return for a reliable way of tracking lineage? Lineage and auditing should work consistently, automatically and quietly, allowing users to access their data with any tool they require to use. The real solution, therefore, is to create a consistent method by which to bring lineage data from these data various disparate sources into the data analysis platform that you use, rather than being forced to use the tool that manages the pipeline for the lineage and a different tool for the data analysis. The key is to keep your logs, keep your audit data, from every source, bring them together and use the data analysis tools to trace the paths from raw data to the answer that data analysis provides.

Read the article
Fast Data - Big Data's achilles heel

- by thegreeneman

At OOW 2013 in Mark Hurd and Thomas Kurian's keynote, they discussed Oracle's Fast Data software solution stack and discussed a number of customers deploying Oracle's Big Data / Fast Data solutions and in particular Oracle's NoSQL Database. Since that time, there have been a large number of request seeking clarification on how the Fast Data software stack works together to deliver on the promise of real-time Big Data solutions. Fast Data is a software solution stack that deals with one aspect of Big Data, high velocity. The software in the Fast Data solution stack involves 3 key pieces and their integration: Oracle Event Processing, Oracle Coherence, Oracle NoSQL Database. All three of these technologies address a high throughput, low latency data management requirement. Oracle Event Processing enables continuous query to filter the Big Data fire hose, enable intelligent chained events to real-time service invocation and augments the data stream to provide Big Data enrichment. Extended SQL syntax allows the definition of sliding windows of time to allow SQL statements to look for triggers on events like breach of weighted moving average on a real-time data stream. Oracle Coherence is a distributed, grid caching solution which is used to provide very low latency access to cached data when the data is too big to fit into a single process, so it is spread around in a grid architecture to provide memory latency speed access. It also has some special capabilities to deploy remote behavioral execution for "near data" processing. The Oracle NoSQL Database is designed to ingest simple key-value data at a controlled throughput rate while providing data redundancy in a cluster to facilitate highly concurrent low latency reads. For example, when large sensor networks are generating data that need to be captured while analysts are simultaneously extracting the data using range based queries for upstream analytics. Another example might be storing cookies from user web sessions for ultra low latency user profile management, also leveraging that data using holistic MapReduce operations with your Hadoop cluster to do segmented site analysis. Understand how NoSQL plays a critical role in Big Data capture and enrichment while simultaneously providing a low latency and scalable data management infrastructure thru clustered, always on, parallel processing in a shared nothing architecture. Learn how easily a NoSQL cluster can be deployed to provide essential services in industry specific Fast Data solutions. See these technologies work together in a demonstration highlighting the salient features of these Fast Data enabling technologies in a location based personalization service. The question then becomes how do these things work together to deliver an end to end Fast Data solution. The answer is that while different applications will exhibit unique requirements that may drive the need for one or the other of these technologies, often when it comes to Big Data you may need to use them together. You may have the need for the memory latencies of the Coherence cache, but just have too much data to cache, so you use a combination of Coherence and Oracle NoSQL to handle extreme speed cache overflow and retrieval. Here is a great reference to how these two technologies are integrated and work together. Coherence & Oracle NoSQL Database. On the stream processing side, it is similar as with the Coherence case. As your sliding windows get larger, holding all the data in the stream can become difficult and out of band data may need to be offloaded into persistent storage. OEP needs an extreme speed database like Oracle NoSQL Database to help it continue to perform for the real time loop while dealing with persistent spill in the data stream. Here is a great resource to learn more about how OEP and Oracle NoSQL Database are integrated and work together. OEP & Oracle NoSQL Database.

Read the article
Find Port Number and Domain Name to connect to Hive Table

- by user1419563

I am new to Hive, MapReduce and Hadoop. I am using Putty to connect to hive table and access records in the tables. So what I did is- I opened Putty and in the host name I typed- ares-ingest.vip.host.com and then I click Open. And then I entered my username and password and then few commands to get to Hive sql. Below is the list what I did $ bash bash-3.00$ hive Hive history file=/tmp/rjamal/hive_job_log_rjamal_201207010451_1212680168.txt hive> set mapred.job.queue.name=hdmi-technology; hive> select * from table LIMIT 1; So my question is- I was trying to connect to Hive Tables using Squirrel SQL Client, so in that my Connection URL is- jdbc:hive://ares-ingest.vip.host.com:10000/default. So whenever I try to connect with these attributes, I always get Hive: Could not establish connection to ares-ingest.vip.host.com:10000/default: java.net.ConnectException: Connection timed out: connect. It might be possible I am using wrong port number or domain name here. Is there any way from the command prompt I can find out these two things, like what Domain Name and Port Number(where Hive server is running) should I use to connect to Hive table from Squirrel SQL Client. As I know host and port are determined by where the hive server is running

Read the article
How are you taking advantage of Multicore?

- by tgamblin

As someone in the world of HPC who came from the world of enterprise web development, I'm always curious to see how developers back in the "real world" are taking advantage of parallel computing. This is much more relevant now that all chips are going multicore, and it'll be even more relevant when there are thousands of cores on a chip instead of just a few. My questions are: How does this affect your software roadmap? I'm particularly interested in real stories about how multicore is affecting different software domains, so specify what kind of development you do in your answer (e.g. server side, client-side apps, scientific computing, etc). What are you doing with your existing code to take advantage of multicore machines, and what challenges have you faced? Are you using OpenMP, Erlang, Haskell, CUDA, TBB, UPC or something else? What do you plan to do as concurrency levels continue to increase, and how will you deal with hundreds or thousands of cores? If your domain doesn't easily benefit from parallel computation, then explaining why is interesting, too. Finally, I've framed this as a multicore question, but feel free to talk about other types of parallel computing. If you're porting part of your app to use MapReduce, or if MPI on large clusters is the paradigm for you, then definitely mention that, too. Update: If you do answer #5, mention whether you think things will change if there get to be more cores (100, 1000, etc) than you can feed with available memory bandwidth (seeing as how bandwidth is getting smaller and smaller per core). Can you still use the remaining cores for your application?

Read the article
Hadoop streaming with Python and python subprocess

- by Ganesh

I have established a basic hadoop master slave cluster setup and able to run mapreduce programs (including python) on the cluster. Now I am trying to run a python code which accesses a C binary and so I am using the subprocess module. I am able to use the hadoop streaming for a normal python code but when I include the subprocess module to access a binary, the job is getting failed. As you can see in the below logs, the hello executable is recognised to be used for the packaging, but still not able to run the code. . . packageJobJar: [/tmp/hello/hello, /app/hadoop/tmp/hadoop-unjar5030080067721998885/] [] /tmp/streamjob7446402517274720868.jar tmpDir=null JarBuilder.addNamedStream hello . . 12/03/07 22:31:32 INFO mapred.FileInputFormat: Total input paths to process : 1 12/03/07 22:31:32 INFO streaming.StreamJob: getLocalDirs(): [/app/hadoop/tmp/mapred/local] 12/03/07 22:31:32 INFO streaming.StreamJob: Running job: job_201203062329_0057 12/03/07 22:31:32 INFO streaming.StreamJob: To kill this job, run: 12/03/07 22:31:32 INFO streaming.StreamJob: /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=master:54311 -kill job_201203062329_0057 12/03/07 22:31:32 INFO streaming.StreamJob: Tracking URL: http://master:50030/jobdetails.jsp?jobid=job_201203062329_0057 12/03/07 22:31:33 INFO streaming.StreamJob: map 0% reduce 0% 12/03/07 22:32:05 INFO streaming.StreamJob: map 100% reduce 100% 12/03/07 22:32:05 INFO streaming.StreamJob: To kill this job, run: 12/03/07 22:32:05 INFO streaming.StreamJob: /usr/local/hadoop/bin/../bin/hadoop job -Dmapred.job.tracker=master:54311 -kill job_201203062329_0057 12/03/07 22:32:05 INFO streaming.StreamJob: Tracking URL: http://master:50030/jobdetails.jsp?jobid=job_201203062329_0057 12/03/07 22:32:05 ERROR streaming.StreamJob: Job not Successful! 12/03/07 22:32:05 INFO streaming.StreamJob: killJob... Streaming Job Failed! Command I am trying is : hadoop jar contrib/streaming/hadoop-*streaming*.jar -mapper /home/hduser/MARS.py -reducer /home/hduser/MARS_red.py -input /user/hduser/mars_inputt -output /user/hduser/mars-output -file /tmp/hello/hello -verbose where hello is the C executable. It is a simple helloworld program which I am using to check the basic functioning. My Python code is : #!/usr/bin/env python import subprocess subprocess.call(["./hello"]) Any help with how to get the executable run with Python in hadoop streaming or help with debugging this will get me forward in this. Thanks, Ganesh

Read the article
Converting python collaborative filtering code to use Map Reduce

- by Neil Kodner

Using Python, I'm computing cosine similarity across items. given event data that represents a purchase (user,item), I have a list of all items 'bought' by my users. Given this input data (user,item) X,1 X,2 Y,1 Y,2 Z,2 Z,3 I build a python dictionary {1: ['X','Y'], 2 : ['X','Y','Z'], 3 : ['Z']} From that dictionary, I generate a bought/not bought matrix, also another dictionary(bnb). {1 : [1,1,0], 2 : [1,1,1], 3 : [0,0,1]} From there, I'm computing similarity between (1,2) by calculating cosine between (1,1,0) and (1,1,1), yielding 0.816496 I'm doing this by: items=[1,2,3] for item in items: for sub in items: if sub >= item: #as to not calculate similarity on the inverse sim = coSim( bnb[item], bnb[sub] ) I think the brute force approach is killing me and it only runs slower as the data gets larger. Using my trusty laptop, this calculation runs for hours when dealing with 8500 users and 3500 items. I'm trying to compute similarity for all items in my dict and it's taking longer than I'd like it to. I think this is a good candidate for MapReduce but I'm having trouble 'thinking' in terms of key/value pairs. Alternatively, is the issue with my approach and not necessarily a candidate for Map Reduce?

Read the article
kick off a map reduce job from my java/mysql webapp

- by Brian

Hi guys, I need a bit of archecture advice. I have a java based webapp, with a JPA based ORM backed onto a mysql relational database. Now, as part of the application I have a batch job that compares thousands of database records with each other. This job has become too time consuming and needs to be parallelized. I'm looking at using mapreduce and hadoop in order to do this. However, I'm not too sure about how to integrate this into my current architecture. I think the easiest initial solution is to find a way to push data from mysql into hadoop jobs. I have done some initial research on this and found the following relevant information and possibilities: 1) https://issues.apache.org/jira/browse/HADOOP-2536 this gives an interesting overview of some inbuilt JDBC support 2) This article http://architects.dzone.com/articles/tools-moving-sql-database describes some third party tools to move data from mysql to hadoop. To be honest I'm just starting out with learning about hbase and hadoop but I really don't know how to integrate this into my webapp. Any advice is greatly appreciated. cheers, Brian

Read the article
Patterns for non-layered applications

- by Paul Stovell

In Patterns of Enterprise Application Architecture, Martin Fowler writes: This book is thus about how you decompose an enterprise application into layers and how those layers work together. Most nontrivial enterprise applications use a layered architecture of some form, but in some situations other approaches, such as pipes and filters, are valuable. I don't go into those situations, focussing instead on the context of a layered architecture because it's the most widely useful. What patterns exist for building non-layered applications/parts of an application? Take a statistical modelling engine for a financial institution. There might be a layer for data access, but I expect that most of the code would be in a single layer. Would you still expect to see Gang of Four patterns in such a layer? How about a domain model? Would you use OO at all, or would it be purely functional? The quote mentions pipes and filters as alternate models to layers. I can easily imagine a such an engine using pipes as a way to break down the data processing. What other patterns exist? Are there common patterns for areas like task scheduling, results aggregation, or work distribution? What are some alternatives to MapReduce?

Read the article
How to pass common arguments to Perl modules

- by Leonard

I'm not thrilled with the argument-passing architecture I'm evolving for the (many) Perl scripts that have been developed for some scripts that call various Hadoop MapReduce jobs. There are currently 8 scripts (of the form run_something.pl) that are run from cron. (And more on the way ... we expect anywhere from 1 to 3 more for every function we add to hadoop.) Each of these have about 6 identical command-line parameters, and a couple command line parameters that are similar, all specified with Euclid. The implementations are in a dozen .pm modules. Some of which are common, and others of which are unique.... Currently I'm passing the args globally to each module ... Inside run_something.pl I have: set_common_args (%ARGV); set_something_args (%ARGV); And inside Something.pm I have sub set_something_args { (%MYARGS) =@_; } So then I can do if ( $MYARGS{'--needs_more_beer'} ) { $beer++; } I'm seeing that I'm probably going to have additional "common" files that I'll want to pass args to, so I'll have three or four set_xxx_args calls at the top of each run_something.pl, and it just doesn't seem too elegant. On the other hand, it beats passing the whole stupid argument array down the call chain, and choosing and passing individual elements down the call chain is (a) too much work (b) error-prone (c) doesn't buy much. In lots of ways what I'm doing is just object-oriented design without the object-oriented language trappings, and it looks uglier without said trappings, but nonetheless ... Anyone have thoughts or ideas?

Read the article
Short snippet summarizing a webpage?

- by Legend

Is there a clean way of grabbing the first few lines of a given link that summarizes that link? I have seen this being done in some online bookmarking applications but have no clue on how they were implemented. For instance, if I give this link, I should be able to get a summary which is roughly like: I'll admit it, I was intimidated by MapReduce. I'd tried to read explanations of it, but even the wonderful Joel Spolsky left me scratching my head. So I plowed ahead trying to build decent pipelines to process massive amounts of data Nothing complex at first sight but grabbing these is the challenging part. Just the first few lines of the actual post should be fine. Should I just use a raw approach of grabbing the entire html and parsing the meta tags or something fancy like that (which obviously and unfortunately is not generalizable to every link out there) or is there a smarter way to achieve this? Any suggestions? Update: I just found InstaPaper do this but am not sure if it is getting the information from RSS feeds or some other way.

Read the article
A generic C++ library that provides QtConcurrent functionality?

- by Lucas

QtConcurrent is awesome. I'll let the Qt docs speak for themselves: QtConcurrent includes functional programming style APIs for parallel list processing, including a MapReduce and FilterReduce implementation for shared-memory (non-distributed) systems, and classes for managing asynchronous computations in GUI applications. For instance, you give QtConcurrent::map() an iterable sequence and a function that accepts items of the type stored in the sequence, and that function is applied to all the items in the collection. This is done in a multi-threaded manner, with a thread pool equal to the number of logical CPU's on the system. There are plenty of other function in QtConcurrent, like filter(), filteredReduced() etc. The standard CompSci map/reduce functions and the like. I'm totally in love with this, but I'm starting work on an OSS project that will not be using the Qt framework. It's a library, and I don't want to force others to depend on such a large framework like Qt. I'm trying to keep external dependencies to a minimum (it's the decent thing to do). I'm looking for a generic C++ framework that provides me with the same/similar high-level primitives that QtConcurrent does. AFAIK boost has nothing like this (I may be wrong though). boost::thread is very low-level compared to what I'm looking for. I know C# has something very similar with their Parallel Extensions so I know this isn't a Qt-only idea. What do you suggest I use?

Read the article
MongoDB map reduce count giving more results than a query

- by giorgiosironi

I have a collection users in Mongo and I execute this map reduce which I believe is the equivalent of a COUNT(*) GROUP BY origin: > m = function() { for (i in this.membership) { ... emit( this.membership[i].platform_profile.origin, 1 ); ... } } function () { for (i in this.membership) { emit(this.membership[i].platform_profile.origin, 1); } } > r = function( id, values ) { var result = 0; ... for ( var i = 0; i < values.length; i ++ ) { result += values[i]; } ... return result; } function (id, values) { var result = 0; for (var i = 0; i < values.length; i++) { result += values[i]; } return result; } > db.users.mapReduce(m, r, {out : { inline: 1}}); { "results" : [ { "_id" : 0, "value" : 15 }, { "_id" : 1, "value" : 449 }, ... } But if I try to count how many documents have this field set to a specific value like 1, I get fewer results: db.users.count({"membership.platform_profile.origin": 1}); 424 What am I missing?

Read the article
Big Data – Role of Cloud Computing in Big Data – Day 11 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the NewSQL. In this article we will understand the role of Cloud in Big Data Story What is Cloud? Cloud is the biggest buzzword around from last few years. Everyone knows about the Cloud and it is extremely well defined online. In this article we will discuss cloud in the context of the Big Data. Cloud computing is a method of providing a shared computing resources to the application which requires dynamic resources. These resources include applications, computing, storage, networking, development and various deployment platforms. The fundamentals of the cloud computing are that it shares pretty much share all the resources and deliver to end users as a service. Examples of the Cloud Computing and Big Data are Google and Amazon.com. Both have fantastic Big Data offering with the help of the cloud. We will discuss this later in this blog post. There are two different Cloud Deployment Models: 1) The Public Cloud and 2) The Private Cloud Public Cloud Public Cloud is the cloud infrastructure build by commercial providers (Amazon, Rackspace etc.) creates a highly scalable data center that hides the complex infrastructure from the consumer and provides various services. Private Cloud Private Cloud is the cloud infrastructure build by a single organization where they are managing highly scalable data center internally. Here is the quick comparison between Public Cloud and Private Cloud from Wikipedia: Public Cloud Private Cloud Initial cost Typically zero Typically high Running cost Unpredictable Unpredictable Customization Impossible Possible Privacy No (Host has access to the data Yes Single sign-on Impossible Possible Scaling up Easy while within defined limits Laborious but no limits Hybrid Cloud Hybrid Cloud is the cloud infrastructure build with the composition of two or more clouds like public and private cloud. Hybrid cloud gives best of the both the world as it combines multiple cloud deployment models together. Cloud and Big Data – Common Characteristics There are many characteristics of the Cloud Architecture and Cloud Computing which are also essentially important for Big Data as well. They highly overlap and at many places it just makes sense to use the power of both the architecture and build a highly scalable framework. Here is the list of all the characteristics of cloud computing important in Big Data Scalability Elasticity Ad-hoc Resource Pooling Low Cost to Setup Infastructure Pay on Use or Pay as you Go Highly Available Leading Big Data Cloud Providers There are many players in Big Data Cloud but we will list a few of the known players in this list. Amazon Amazon is arguably the most popular Infrastructure as a Service (IaaS) provider. The history of how Amazon started in this business is very interesting. They started out with a massive infrastructure to support their own business. Gradually they figured out that their own resources are underutilized most of the time. They decided to get the maximum out of the resources they have and hence they launched their Amazon Elastic Compute Cloud (Amazon EC2) service in 2006. Their products have evolved a lot recently and now it is one of their primary business besides their retail selling. Amazon also offers Big Data services understand Amazon Web Services. Here is the list of the included services: Amazon Elastic MapReduce – It processes very high volumes of data Amazon DynammoDB – It is fully managed NoSQL (Not Only SQL) database service Amazon Simple Storage Services (S3) – A web-scale service designed to store and accommodate any amount of data Amazon High Performance Computing – It provides low-tenancy tuned high performance computing cluster Amazon RedShift – It is petabyte scale data warehousing service Google Though Google is known for Search Engine, we all know that it is much more than that. Google Compute Engine – It offers secure, flexible computing from energy efficient data centers Google Big Query – It allows SQL-like queries to run against large datasets Google Prediction API – It is a cloud based machine learning tool Other Players Besides Amazon and Google we also have other players in the Big Data market as well. Microsoft is also attempting Big Data with the Cloud with Microsoft Azure. Additionally Rackspace and NASA together have initiated OpenStack. The goal of Openstack is to provide a massively scaled, multitenant cloud that can run on any hardware. Thing to Watch The cloud based solutions provides a great integration with the Big Data’s story as well it is very economical to implement as well. However, there are few things one should be very careful when deploying Big Data on cloud solutions. Here is a list of a few things to watch: Data Integrity Initial Cost Recurring Cost Performance Data Access Security Location Compliance Every company have different approaches to Big Data and have different rules and regulations. Based on various factors, one can implement their own custom Big Data solution on a cloud. Tomorrow In tomorrow’s blog post we will discuss about various Operational Databases supporting Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
FairScheduling Conventions in Hadoop

- by dan.mcclary

While scheduling and resource allocation control has been present in Hadoop since 0.20, a lot of people haven't discovered or utilized it in their initial investigations of the Hadoop ecosystem. We could chalk this up to many things: Organizations are still determining what their dataflow and analysis workloads will comprise Small deployments under tests aren't likely to show the signs of strains that would send someone looking for resource allocation options The default scheduling options -- the FairScheduler and the CapacityScheduler -- are not placed in the most prominent position within the Hadoop documentation. However, for production deployments, it's wise to start with at least the foundations of scheduling in place so that you can tune the cluster as workloads emerge. To do that, we have to ask ourselves something about what the off-the-rack scheduling options are. We have some choices: The FairScheduler, which will work to ensure resource allocations are enforced on a per-job basis. The CapacityScheduler, which will ensure resource allocations are enforced on a per-queue basis. Writing your own implementation of the abstract class org.apache.hadoop.mapred.job.TaskScheduler is an option, but usually overkill. If you're going to have several concurrent users and leverage the more interactive aspects of the Hadoop environment (e.g. Pig and Hive scripting), the FairScheduler is definitely the way to go. In particular, we can do user-specific pools so that default users get their fair share, and specific users are given the resources their workloads require. To enable fair scheduling, we're going to need to do a couple of things. First, we need to tell the JobTracker that we want to use scheduling and where we're going to be defining our allocations. We do this by adding the following to the mapred-site.xml file in HADOOP_HOME/conf: <property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.FairScheduler</value> </property> <property> <name>mapred.fairscheduler.allocation.file</name> <value>/path/to/allocations.xml</value> </property> <property> <name>mapred.fairscheduler.poolnameproperty</name> <value>pool.name</value> </property> <property> <name>pool.name</name> <value>${user.name}</name> </property> What we've done here is simply tell the JobTracker that we'd like to task scheduling to use the FairScheduler class rather than a single FIFO queue. Moreover, we're going to be defining our resource pools and allocations in a file called allocations.xml For reference, the allocation file is read every 15s or so, which allows for tuning allocations without having to take down the JobTracker. Our allocation file is now going to look a little like this <?xml version="1.0"?> <allocations> <pool name="dan"> <minMaps>5</minMaps> <minReduces>5</minReduces> <maxMaps>25</maxMaps> <maxReduces>25</maxReduces> <minSharePreemptionTimeout>300</minSharePreemptionTimeout> </pool> <mapreduce.job.user.name="dan"> <maxRunningJobs>6</maxRunningJobs> </user> <userMaxJobsDefault>3</userMaxJobsDefault> <fairSharePreemptionTimeout>600</fairSharePreemptionTimeout> </allocations> In this case, I've explicitly set my username to have upper and lower bounds on the maps and reduces, and allotted myself double the number of running jobs. Now, if I run hive or pig jobs from either the console or via the Hue web interface, I'll be treated "fairly" by the JobTracker. There's a lot more tweaking that can be done to the allocations file, so it's best to dig down into the description and start trying out allocations that might fit your workload.

Read the article
Big Data – Buzz Words: What is HDFS – Day 8 of 21

- by Pinal Dave

In yesterday’s blog post we learned what is MapReduce. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – HDFS. What is HDFS ? HDFS stands for Hadoop Distributed File System and it is a primary storage system used by Hadoop. It provides high performance access to data across Hadoop clusters. It is usually deployed on low-cost commodity hardware. In commodity hardware deployment server failures are very common. Due to the same reason HDFS is built to have high fault tolerance. The data transfer rate between compute nodes in HDFS is very high, which leads to reduced risk of failure. HDFS creates smaller pieces of the big data and distributes it on different nodes. It also copies each smaller piece to multiple times on different nodes. Hence when any node with the data crashes the system is automatically able to use the data from a different node and continue the process. This is the key feature of the HDFS system. Architecture of HDFS The architecture of the HDFS is master/slave architecture. An HDFS cluster always consists of single NameNode. This single NameNode is a master server and it manages the file system as well regulates access to various files. In additional to NameNode there are multiple DataNodes. There is always one DataNode for each data server. In HDFS a big file is split into one or more blocks and those blocks are stored in a set of DataNodes. The primary task of the NameNode is to open, close or rename files and directory and regulate access to the file system, whereas the primary task of the DataNode is read and write to the file systems. DataNode is also responsible for the creation, deletion or replication of the data based on the instruction from NameNode. In reality, NameNode and DataNode are software designed to run on commodity machine build in Java language. Visual Representation of HDFS Architecture Let us understand how HDFS works with the help of the diagram. Client APP or HDFS Client connects to NameSpace as well as DataNode. Client App access to the DataNode is regulated by NameSpace Node. NameSpace Node allows Client App to connect to the DataNode based by allowing the connection to the DataNode directly. A big data file is divided into multiple data blocks (let us assume that those data chunks are A,B,C and D. Client App will later on write data blocks directly to the DataNode. Client App does not have to directly write to all the node. It just has to write to any one of the node and NameNode will decide on which other DataNode it will have to replicate the data. In our example Client App directly writes to DataNode 1 and detained 3. However, data chunks are automatically replicated to other nodes. All the information like in which DataNode which data block is placed is written back to NameNode. High Availability During Disaster Now as multiple DataNode have same data blocks in the case of any DataNode which faces the disaster, the entire process will continue as other DataNode will assume the role to serve the specific data block which was on the failed node. This system provides very high tolerance to disaster and provides high availability. If you notice there is only single NameNode in our architecture. If that node fails our entire Hadoop Application will stop performing as it is a single node where we store all the metadata. As this node is very critical, it is usually replicated on another clustered as well as on another data rack. Though, that replicated node is not operational in architecture, it has all the necessary data to perform the task of the NameNode in the case of the NameNode fails. The entire Hadoop architecture is built to function smoothly even there are node failures or hardware malfunction. It is built on the simple concept that data is so big it is impossible to have come up with a single piece of the hardware which can manage it properly. We need lots of commodity (cheap) hardware to manage our big data and hardware failure is part of the commodity servers. To reduce the impact of hardware failure Hadoop architecture is built to overcome the limitation of the non-functioning hardware. Tomorrow In tomorrow’s blog post we will discuss the importance of the relational database in Big Data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Pre-rentrée Oracle Open World 2012 : à vos agendas

- by Eric Bezille

A maintenant moins d'un mois de l’événement majeur d'Oracle, qui se tient comme chaque année à San Francisco, fin septembre, début octobre, les spéculations vont bon train sur les annonces qui vont y être dévoilées... Et sans lever le voile, je vous engage à prendre connaissance des sujets des "Key Notes" qui seront tenues par Larry Ellison, Mark Hurd, Thomas Kurian (responsable des développements logiciels) et John Fowler (responsable des développements systèmes) afin de vous donner un avant goût. Stratégie et Roadmaps Oracle Bien entendu, au-delà des séances plénières qui vous donnerons une vision précise de la stratégie, et pour ceux qui seront sur place, je vous engage à ne pas manquer les séances d'approfondissement qui auront lieu dans la semaine, dont voici quelques morceaux choisis : "Accelerate your Business with the Oracle Hardware Advantage" avec John Fowler, le lundi 1er Octobre, 3:15pm-4:15pm "Why Oracle Softwares Runs Best on Oracle Hardware" , avec Bradley Carlile, le responsable des Benchmarks, le lundi 1er Octobre, 12:15pm-13:15pm "Engineered Systems - from Vision to Game-changing Results", avec Robert Shimp, le lundi 1er Octobre 1:45pm-2:45pm "Database and Application Consolidation on SPARC Supercluster", avec Hugo Rivero, responsable dans les équipes d'intégration matériels et logiciels, le lundi 1er Octobre, 4:45pm-5:45pm "Oracle’s SPARC Server Strategy Update", avec Masood Heydari, responsable des développements serveurs SPARC, le mardi 2 Octobre, 10:15am - 11:15am "Oracle Solaris 11 Strategy, Engineering Insights, and Roadmap", avec Markus Flier, responsable des développements Solaris, le mercredi 3 Octobre, 10:15am - 11:15am "Oracle Virtualization Strategy and Roadmap", avec Wim Coekaerts, responsable des développement Oracle VM et Oracle Linux, le lundi 1er Octobre, 12:15pm-1:15pm "Big Data: The Big Story", avec Jean-Pierre Dijcks, responsable du développement produits Big Data, le lundi 1er Octobre, 3:15pm-4:15pm "Scaling with the Cloud: Strategies for Storage in Cloud Deployments", avec Christine Rogers, Principal Product Manager, et Chris Wood, Senior Product Specialist, Stockage , le lundi 1er Octobre, 10:45am-11:45am Retours d'expériences et témoignages Si Oracle Open World est l'occasion de partager avec les équipes de développement d'Oracle en direct, c'est aussi l'occasion d'échanger avec des clients et experts qui ont mis en oeuvre nos technologies pour bénéficier de leurs retours d'expériences, comme par exemple : "Oracle Optimized Solution for Siebel CRM at ACCOR", avec les témoignages d'Eric Wyttynck, directeur IT Multichannel & CRM et Pascal Massenet, VP Loyalty & CRM systems, sur les bénéfices non seulement métiers, mais également projet et IT, le mercredi 3 Octobre, 1:15pm-2:15pm "Tips from AT&T: Oracle E-Business Suite, Oracle Database, and SPARC Enterprise", avec le retour d'expérience des experts Oracle, le mardi 2 Octobre, 11:45am-12:45pm "Creating a Maximum Availability Architecture with SPARC SuperCluster", avec le témoignage de Carte Wright, Database Engineer à CKI, le mercredi 3 Octobre, 11:45am-12:45pm "Multitenancy: Everybody Talks It, Oracle Walks It with Pillar Axiom Storage", avec le témoignage de Stephen Schleiger, Manager Systems Engineering de Navis, le lundi 1er Octobre, 1:45pm-2:45pm "Oracle Exadata for Database Consolidation: Best Practices", avec le retour d'expérience des experts Oracle ayant participé à la mise en oeuvre d'un grand client du monde bancaire, le lundi 1er Octobre, 4:45pm-5:45pm "Oracle Exadata Customer Panel: Packaged Applications with Oracle Exadata", animé par Tim Shetler, VP Product Management, mardi 2 Octobre, 1:15pm-2:15pm "Big Data: Improving Nearline Data Throughput with the StorageTek SL8500 Modular Library System", avec le témoignage du CTO de CSC, Alan Powers, le jeudi 4 Octobre, 12:45pm-1:45pm "Building an IaaS Platform with SPARC, Oracle Solaris 11, and Oracle VM Server for SPARC", avec le témoignage de Syed Qadri, Lead DBA et Michael Arnold, System Architect d'US Cellular, le mardi 2 Octobre, 10:15am-11:15am "Transform Data Center TCO with Oracle Optimized Servers: A Customer Panel", avec les témoignages notamment d'AT&T et Liberty Global, le mardi 2 Octobre, 11:45am-12:45pm "Data Warehouse and Big Data Customers’ View of the Future", avec The Nielsen Company US, Turkcell, GE Retail Finance, Allianz Managed Operations and Services SE, le lundi 1er Octobre, 4:45pm-5:45pm "Extreme Storage Scale and Efficiency: Lessons from a 100,000-Person Organization", le témoignage de l'IT interne d'Oracle sur la transformation et la migration de l'ensemble de notre infrastructure de stockage, mardi 2 Octobre, 1:15pm-2:15pm Echanges avec les groupes d'utilisateurs et les équipes de développement Oracle Si vous avez prévu d'arriver suffisamment tôt, vous pourrez également échanger dès le dimanche avec les groupes d'utilisateurs, ou tous les soirs avec les équipes de développement Oracle sur des sujets comme : "To Exalogic or Not to Exalogic: An Architectural Journey", avec Todd Sheetz - Manager of DBA and Enterprise Architecture, Veolia Environmental Services, le dimanche 30 Septembre, 2:30pm-3:30pm "Oracle Exalytics and Oracle TimesTen for Exalytics Best Practices", avec Mark Rittman, de Rittman Mead Consulting Ltd, le dimanche 30 Septembre, 10:30am-11:30am "Introduction of Oracle Exadata at Telenet: Bringing BI to Warp Speed", avec Rudy Verlinden & Eric Bartholomeus - Managers IT infrastructure à Telenet, le dimanche 30 Septembre, 1:15pm-2:00pm "The Perfect Marriage: Sun ZFS Storage Appliance with Oracle Exadata", avec Melanie Polston, directeur, Data Management, de Novation et Charles Kim, Managing Director de Viscosity, le dimanche 30 Septembre, 9:00am-10am "Oracle’s Big Data Solutions: NoSQL, Connectors, R, and Appliance Technologies", avec Jean-Pierre Dijcks et les équipes de développement Oracle, le lundi 1er Octobre, 6:15pm-7:00pm Testez et évaluez les solutions Et pour finir, vous pouvez même tester les technologies au travers du Oracle DemoGrounds, (1133 Moscone South pour la partie Systèmes Oracle, OS, et Virtualisation) et des "Hands-on-Labs", comme : "Deploying an IaaS Environment with Oracle VM", le mardi 2 Octobre, 10:15am-11:15am "Virtualize and Deploy Oracle Applications in Minutes with Oracle VM: Hands-on Lab", le mardi 2 Octobre, 11:45am-12:45pm (il est fortement conseillé d'avoir suivi le "Hands-on-Labs" précédent avant d'effectuer ce Lab. "x86 Enterprise Cloud Infrastructure with Oracle VM 3.x and Sun ZFS Storage Appliance", le mercredi 3 Octobre, 5:00pm-6:00pm "StorageTek Tape Analytics: Managing Tape Has Never Been So Simple", le mercredi 3 Octobre, 1:15pm-2:15pm "Oracle’s Pillar Axiom 600 Storage System: Power and Ease", le lundi 1er Octobre, 12:15pm-1:15pm "Enterprise Cloud Infrastructure for SPARC with Oracle Enterprise Manager Ops Center 12c", le lundi 1er Octobre, 1:45pm-2:45pm "Managing Storage in the Cloud", le mardi 2 Octobre, 5:00pm-6:00pm "Learn How to Write MapReduce on Oracle’s Big Data Platform", le lundi 1er Octobre, 12:15pm-1:15pm "Oracle Big Data Analytics and R", le mardi 2 Octobre, 1:15pm-2:15pm "Reduce Risk with Oracle Solaris Access Control to Restrain Users and Isolate Applications", le lundi 1er Octobre, 10:45am-11:45am "Managing Your Data with Built-In Oracle Solaris ZFS Data Services in Release 11", le lundi 1er Octobre, 4:45pm-5:45pm "Virtualizing Your Oracle Solaris 11 Environment", le mardi 2 Octobre, 1:15pm-2:15pm "Large-Scale Installation and Deployment of Oracle Solaris 11", le mercredi 3 Octobre, 3:30pm-4:30pm En conclusion, une semaine très riche en perspective, et qui vous permettra de balayer l'ensemble des sujets au coeur de vos préoccupations, de la stratégie à l'implémentation... Cette semaine doit se préparer, pour tailler votre agenda sur mesure, à travers les plus de 2000 sessions dont je ne vous ai fait qu'un extrait, et dont vous pouvez retrouver l'ensemble en ligne.

Read the article
Big Data Matters with ODI12c

- by Madhu Nair

contributed by Mike Eisterer On October 17th, 2013, Oracle announced the release of Oracle Data Integrator 12c (ODI12c). This release signifies improvements to Oracle’s Data Integration portfolio of solutions, particularly Big Data integration. Why Big Data = Big Business Organizations are gaining greater insights and actionability through increased storage, processing and analytical benefits offered by Big Data solutions. New technologies and frameworks like HDFS, NoSQL, Hive and MapReduce support these benefits now. As further data is collected, analytical requirements increase and the complexity of managing transformations and aggregations of data compounds and organizations are in need for scalable Data Integration solutions. ODI12c provides enterprise solutions for the movement, translation and transformation of information and data heterogeneously and in Big Data Environments through: The ability for existing ODI and SQL developers to leverage new Big Data technologies. A metadata focused approach for cataloging, defining and reusing Big Data technologies, mappings and process executions. Integration between many heterogeneous environments and technologies such as HDFS and Hive. Generation of Hive Query Language. Working with Big Data using Knowledge Modules ODI12c provides developers with the ability to define sources and targets and visually develop mappings to effect the movement and transformation of data. As the mappings are created, ODI12c leverages a rich library of prebuilt integrations, known as Knowledge Modules (KMs). These KMs are contextual to the technologies and platforms to be integrated. Steps and actions needed to manage the data integration are pre-built and configured within the KMs. The Oracle Data Integrator Application Adapter for Hadoop provides a series of KMs, specifically designed to integrate with Big Data Technologies. The Big Data KMs include: Check Knowledge Module Reverse Engineer Knowledge Module Hive Transform Knowledge Module Hive Control Append Knowledge Module File to Hive (LOAD DATA) Knowledge Module File-Hive to Oracle (OLH-OSCH) Knowledge Module Nothing to beat an Example: To demonstrate the use of the KMs which are part of the ODI Application Adapter for Hadoop, a mapping may be defined to move data between files and Hive targets. The mapping is defined by dragging the source and target into the mapping, performing the attribute (column) mapping (see Figure 1) and then selecting the KM which will govern the process. In this mapping example, movie data is being moved from an HDFS source into a Hive table. Some of the attributes, such as “CUSTID to custid”, have been mapped over. Figure 1 Defining the Mapping Before the proper KM can be assigned to define the technology for the mapping, it needs to be added to the ODI project. The Big Data KMs have been made available to the project through the KM import process. Generally, this is done prior to defining the mapping. Figure 2 Importing the Big Data Knowledge Modules Following the import, the KMs are available in the Designer Navigator. v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} Normal 0 false false false EN-US ZH-TW X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";} Figure 3 The Project View in Designer, Showing Installed IKMs Once the KM is imported, it may be assigned to the mapping target. This is done by selecting the Physical View of the mapping and examining the Properties of the Target. In this case MOVIAPP_LOG_STAGE is the target of our mapping. Figure 4 Physical View of the Mapping and Assigning the Big Data Knowledge Module to the Target Alternative KMs may have been selected as well, providing flexibility and abstracting the logical mapping from the physical implementation. Our mapping may be applied to other technologies as well. The mapping is now complete and is ready to run. We will see more in a future blog about running a mapping to load Hive. To complete the quick ODI for Big Data Overview, let us take a closer look at what the IKM File to Hive is doing for us. ODI provides differentiated capabilities by defining the process and steps which normally would have to be manually developed, tested and implemented into the KM. As shown in figure 5, the KM is preparing the Hive session, managing the Hive tables, performing the initial load from HDFS and then performing the insert into Hive. HDFS and Hive options are selected graphically, as shown in the properties in Figure 4. Figure 5 Process and Steps Managed by the KM What’s Next Big Data being the shape shifting business challenge it is is fast evolving into the deciding factor between market leaders and others. Now that an introduction to ODI and Big Data has been provided, look for additional blogs coming soon using the Knowledge Modules which make up the Oracle Data Integrator Application Adapter for Hadoop: Importing Big Data Metadata into ODI, Testing Data Stores and Loading Hive Targets Generating Transformations using Hive Query language Loading Oracle from Hadoop Sources For more information now, please visit the Oracle Data Integrator Application Adapter for Hadoop web site, http://www.oracle.com/us/products/middleware/data-integration/hadoop/overview/index.html Do not forget to tune in to the ODI12c Executive Launch webcast on the 12th to hear more about ODI12c and GG12c. Normal 0 false false false EN-US ZH-TW X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-size:10.0pt; font-family:"Calibri","sans-serif"; mso-bidi-font-family:"Times New Roman";}

Read the article
CodePlex Daily Summary for Thursday, March 29, 2012

CodePlex Daily Summary for Thursday, March 29, 2012Popular ReleasesWouter's SharePoint Demo Land: Custom Views and Forms: Samples showing how to provision custom lists. Custom view rendering with code, XSLT, XSLT + code. Custom forms with database lookups.YouTube API Class & Server Control for ASP.NET 4.0: Binary dll Files with Demo: . You can find binary dll files under :- Demo\bin\ folder Demo url :- http://demo.svinn.com/YouTube/Default.aspx 1 GB Hosting + Free Domain => http://ghosting.in/ .callisto: callisto 2.0.23: Patched Script static class and peak user count bug fix.Circuit Diagram: Circuit Diagram 2.0 Alpha 3: New in this release: Added components: Microcontroller Demultiplexer Flip & rotate components Open XML files from older versions of Circuit Diagram Text formatting for components New CDDX syntax Other fixesUmbraco CMS: Umbraco 5.1 CMS (Beta): Beta build for testing - please report issues at issues.umbraco.org (Latest uploaded: 5.1.0.123) What's new in 5.1? The full list of changes is on our http://progress.umbraco.org task tracking page. It shows items complete for 5.1, and 5.1 includes items for 5.0.1 and 5.0.2 listed there too. Here's two headline acts: Members5.1 adds support for backoffice editing of Members. We support the pairing up of our content type system in Hive with regular ASP.NET Membership providers (we ship a def...51Degrees.mobi - Mobile Device Detection and Redirection: 2.1.2.11: One Click Install from NuGet Changes to Version 2.1.2.11Code Changes 1. The project is now licenced under the Mozilla Public Licence 2. 2. User interface control and associated data access layer classes have been added to aid developers integrating 51Degrees.mobi into wider projects such as content management systems or web hosting management solutions. Use the following in a web form or user control to access these new UI components. <%@ Register Assembly="FiftyOne.Foundation" Namespace="...JSON Toolkit: JSON Toolkit 3.1: slight performance improvement (5% - 10%) new JsonException classPicturethrill: Version 2.3.28.0: Straightforward image selection. New clean UI look. Super stable. Simplified user experience.Indent Guides for Visual Studio: Indent Guides v12 (beta 2): Note This beta is likely to be less stable than the previous one. If you have severe troubles using this version, please report them with as much detail as possible (especially other extensions/addins that you may have) and downgrade to the last stable release. Version History Changed since v12 (beta 1): new options dialog with Quick Set selections for behavior restructured settings storage (should be more reliable) asynchronous background document analysis glow effect now appears in p...SQL Monitor - managing sql server performance: SQL Monitor 4.2 alpha 16: 1. finally fixed problem with logic fault checking for temporary table name... I really mean finally ...ScintillaNET: ScintillaNET 2.5: A slew of bug-fixes with a few new features sprinkled in. This release also upgrades the SciLexer and SciLexer64 DLLs to version 3.0.4. The official stuff: Issue # Title 32402 32402 27137 27137 31548 31548 30179 30179 24932 24932 29701 29701 31238 31238 26875 26875 30052 30052 Mugen MVVM Toolkit: Mugen MVVM Toolkit ver 1.1: Added Design mode support.Multiwfn: Multiwfn 2.3.2: Multiwfn 2.3.2Harness: Harness 2.0.2: change to .NET Framework Client Profile bug fix the download dialog auto answer. bug fix setFocus command. add "SendKeys" command. remove "closeAll" command. minor bugs fixed.BugNET Issue Tracker: BugNET 0.9.161: Below is a list of fixes in this release. Bug BGN-2092 - Link in Email "visit your profile" not functional BGN-2083 - Manager of bugnet can not edit project when it is not public BGN-2080 - clicking on a link in the project summary causes error (0.9.152.0) BGN-2070 - Missing Functionality On Feed.aspx BGN-2069 - Calendar View does not work BGN-2068 - Time tracking totals not ok BGN-2067 - Issues List Page Size Bug: Index was out of range. Must be non-negative and less than the si...YAF.NET (aka Yet Another Forum.NET): v1.9.6.1 RTW: v1.9.6.1 FINAL is .NET v4.0 ONLY v1.9.6.1 has: Performance Improvements .NET v4.0 improvements Improved FaceBook Integration KNOWN ISSUES WITH THIS RELEASE: ON INSTALL PLEASE DON'T CHECK "Upgrade BBCode Extensions...". More complete change list and discussion here: http://forum.yetanotherforum.net/yaf_postst14201_v1-9-6-1-RTW-Dated--3-26-2012.aspxmenu4web: menu4web 0.0.3: menu4web 0.0.3ArcGIS Editor for OpenStreetMap: ArcGIS Editor for OSM 2.0 Final: This release installs both the ArcGIS Editor for OSM Server Component and/or ArcGIS Editor for OSM Desktop components. The Desktop tools allow you to download data from the OpenStreetMap servers and store it locally in a geodatabase. You can then use the familiar editing environment of ArcGIS Desktop to create, modify, or delete data. Once you are done editing, you can post back the edit changes to OSM to make them available to all OSM users. The Server Component allows you to quickly create...Craig's Utility Library: Craig's Utility Library 3.1: This update adds about 60 new extension methods, a couple of new classes, and a number of fixes including: Additions Added DateSpan class Added GenericDelimited class Random additions Added static thread friendly version of Random.Next called ThreadSafeNext. AOP Manager additions Added Destroy function to AOPManager (clears out all data so system can be recreated. Really only useful for testing...) ORM additions Added PagedCommand and PageCount functions to ObjectBaseClass (same as M...DotSpatial: DotSpatial 1.1: This is a Minor Release. See the changes in the issue tracker. Minimal -- includes DotSpatial core and essential extensions Extended -- includes debugging symbols and additional extensions Just want to run the software? End user (non-programmer) version available branded as MapWindow Want to add your own feature? Develop a plugin, using the template and contribute to the extension feed (you can also write extensions that you distribute in other ways). Components are available as NuGet pa...New ProjectsAeroFichierAchats: Program that reworks excel files into PDFArepa: A lightweight non-invasive tool that helps you to implement Behavior Driven Development (BDD) on .NET projects. Arepa produces guidelines of using BDD on your current tests, and customisable and portable test reports integrating XML Documentation Comments. It's developed in C# using Scrum and BDD.Azure User Management Console - AUMC: Azure User Management Console - AUMC is a User Graphic Interface (GUI) that manages the users and logins of an Azure SQL database. The tool is simply converting your action into T-SQL commands and execute them on the Azure SQL Database. BaseProject: BaseProjectCraftBukkit Updater: I don't activly monitor the status of CraftBukkit's updates, but I want to stay on top of the updated versions. This will track the last version in cache then check for updates. Upon update there should be an option to download automaticly to location XYZ or to send an email/sms.Dan Dot Com: Dan's SandboxEjemplos Windows 8 Javascript: Este proyecto pretende recoger todos los ejemplos de aplicaciones metro para Windows 8 en JavaScript que vaya desarrollando para fines didácticos. Todas las colaboraciones son bienvenidas.EntityFramework Generic Patterns: EntityFramework Generic PatternsEpFamvir: EpFamvir is a Visual C++ software framework that supports data-intensive distributed applications under a GPL3.0 license. It enables applications to work with thousands of nodes and petabytes of data. Famvir was inspired by Apache Hadoop, Google's MapReduce, and Google FS.etdc/etp: ???????????????? ???????? ??????? etest.ru. IndieCiv: IndieCiv - An independent look at how a new game could be made using fan-made graphics from Civ3. You can visit the discussion at www.civfanatics.com http://forums.civfanatics.com/showthread.php?t=421582Libro SQL Server 2012: Databases and examples for my Italian book on "SQL Server 2012"Linq2Rest: Parses OData system query parameters to create a LINQ query that can be used to filter a model set. Also exposes a LINQ provider for web services supporting the OData query parameters. Use extension method Filter (in Linq2Rest namespace) on any IEnumerable source.LittleUmph: It's a little library to help you to alleviate some of the mundane stuff during your development. It has some nifty stuff like a neat database wrapper, conversion utilities, string functions and vast of other mini helpers to improve the efficiency and consistency of your code.LonghornRPG: LonghornRPGMemberManager: Keep track of membersMSU Student Teaching DataBase: A C# application to manage student placements for MSU.MvcContrib4: This is the MvcContrib project to support MVC 4MYFIRSTDEMOPROJECT: Introduction to CSharpOrchard Page Title Override: This Orchard CMS module allows a user to manage the Page Title. You can now have the Site Name show up last in the Page Title or hide the Site Name completely from the Page Title. The module also contains a Content Part allowing a Page Title to be separate from content title.phoenixtree: .net linux sql php python html javascript c exampleProject Explorer for Notepad++: This project mainly aims at providing a functional rich project explorer experience on notepad++. Features include Folder based file browsing, searching, filtering and more.Projeto Exemplo FPU: Projeto Exemplo FPUPureCalendar: simple web calendarrails study: rails studySafe IM: a safe IM software use AES,SHA,RSA.Contain a server and a clientScoreboard: Scoreboard is a simple sports and/or activities scoreboard which can be used for a variety of purposes It was developed for use with a 1024x768 projector and features home and away scores, countdown clock, even a buzzer. It's developed in VB.net.SharePoint System dates and editor fields updater: The SPChangeDate is a utility used to modify some of the system fields in a SharePoint 2010 document library using a datagridview (Excel Like). It allows the modification of the following SharePoint fields: 1. Created 2. Modified 3. Modified By 4. Created By Shi Ji Xiang CRM 2012: Shi Ji XiangSIGECAR: SIGECARStable Dependencies Principle Checker for nDepend: Small console app for parsing nDepend output files and find assemblies that breaks the Stable Dependencies PrincipleStudentsPointManager: StudentsPointManagertclh123test: 4testThai Sign Language Translator: Thai Sign Language Translator with Kinect Sensor.The House FM / My Praise FM Desktop App: An App to play Christian radio on the desktop of windows Vista and Windows 7 computersTradeSea: Uni ProjectVisual Dependency Tracer: This tool will help you understand how an excel formula is derived/calculated. It provides a visual representation (tree) of the formula, including the precendents and dependents.VK Video Player: Video player for russian social network vk.comXrmSvcToolkit: XrmSvcToolkit is a small JavaScript library that helps you access Microsoft Dynamics CRM 2011 web service interfaces (SOAP and REST).

Read the article
Hadoop hdfs namenode is throwing an error

- by KarmicDice

Full list of error: hb@localhost:/etc/hadoop/conf$ sudo service hadoop-hdfs-namenode start * Starting Hadoop namenode: starting namenode, logging to /var/log/hadoop-hdfs/hadoop-hdfs-namenode-localhost.out 12/09/10 14:41:09 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = localhost/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 2.0.0-cdh4.0.1 STARTUP_MSG: classpath = /etc/hadoop/conf:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/usr/lib/hadoop/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop/lib/kfs-0.3.jar:/usr/lib/hadoop/lib/asm-3.2.jar:/usr/lib/hadoop/lib/commons-logging-api-1.1.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.23.jar:/usr/lib/hadoop/lib/stax-api-1.0.1.jar:/usr/lib/hadoop/lib/commons-configuration-1.6.jar:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jersey-server-1.8.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop/lib/json-simple-1.1.jar:/usr/lib/hadoop/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop/lib/commons-httpclient-3.1.jar:/usr/lib/hadoop/lib/log4j-1.2.15.jar:/usr/lib/hadoop/lib/servlet-api-2.5.jar:/usr/lib/hadoop/lib/jackson-xc-1.8.8.jar:/usr/lib/hadoop/lib/jersey-json-1.8.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop/lib/commons-collections-3.2.1.jar:/usr/lib/hadoop/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop/lib/jersey-core-1.8.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/jsr305-1.3.9.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/activation-1.1.jar:/usr/lib/hadoop/lib/jaxb-impl-2.2.3-1.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.cloudera.1.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.23.jar:/usr/lib/hadoop/lib/commons-beanutils-1.7.0.jar:/usr/lib/hadoop/lib/commons-lang-2.5.jar:/usr/lib/hadoop/lib/commons-digester-1.8.jar:/usr/lib/hadoop/lib/commons-io-2.1.jar:/usr/lib/hadoop/lib/jsp-api-2.1.jar:/usr/lib/hadoop/lib/guava-11.0.2.jar:/usr/lib/hadoop/lib/jetty-6.1.26.cloudera.1.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/zookeeper-3.4.3-cdh4.0.1.jar:/usr/lib/hadoop/lib/avro-1.5.4.jar:/usr/lib/hadoop/lib/core-3.1.1.jar:/usr/lib/hadoop/lib/paranamer-2.3.jar:/usr/lib/hadoop/lib/jettison-1.1.jar:/usr/lib/hadoop/lib/jackson-jaxrs-1.8.8.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/hadoop/lib/commons-beanutils-core-1.8.0.jar:/usr/lib/hadoop/lib/commons-net-3.1.jar:/usr/lib/hadoop/lib/jaxb-api-2.2.2.jar:/usr/lib/hadoop/lib/commons-math-2.1.jar:/usr/lib/hadoop/lib/jline-0.9.94.jar:/usr/lib/hadoop/.//hadoop-annotations.jar:/usr/lib/hadoop/.//hadoop-annotations-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop/.//hadoop-auth-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop/.//hadoop-common.jar:/usr/lib/hadoop/.//hadoop-auth.jar:/usr/lib/hadoop/.//hadoop-common-2.0.0-cdh4.0.1-tests.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-hdfs/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop-hdfs/lib/log4j-1.2.15.jar:/usr/lib/hadoop-hdfs/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/slf4j-api-1.6.1.jar:/usr/lib/hadoop-hdfs/lib/commons-logging-1.1.1.jar:/usr/lib/hadoop-hdfs/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-hdfs/lib/commons-daemon-1.0.3.jar:/usr/lib/hadoop-hdfs/lib/zookeeper-3.4.3-cdh4.0.1.jar:/usr/lib/hadoop-hdfs/lib/avro-1.5.4.jar:/usr/lib/hadoop-hdfs/lib/paranamer-2.3.jar:/usr/lib/hadoop-hdfs/lib/jline-0.9.94.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.0.1-tests.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-hdfs/.//hadoop-hdfs.jar:/usr/lib/hadoop-yarn/lib/protobuf-java-2.4.0a.jar:/usr/lib/hadoop-yarn/lib/asm-3.2.jar:/usr/lib/hadoop-yarn/lib/netty-3.2.3.Final.jar:/usr/lib/hadoop-yarn/lib/javax.inject-1.jar:/usr/lib/hadoop-yarn/lib/jersey-server-1.8.jar:/usr/lib/hadoop-yarn/lib/jersey-guice-1.8.jar:/usr/lib/hadoop-yarn/lib/snappy-java-1.0.3.2.jar:/usr/lib/hadoop-yarn/lib/log4j-1.2.15.jar:/usr/lib/hadoop-yarn/lib/guice-3.0.jar:/usr/lib/hadoop-yarn/lib/jackson-mapper-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/junit-4.8.2.jar:/usr/lib/hadoop-yarn/lib/jackson-core-asl-1.8.8.jar:/usr/lib/hadoop-yarn/lib/jersey-core-1.8.jar:/usr/lib/hadoop-yarn/lib/jdiff-1.0.9.jar:/usr/lib/hadoop-yarn/lib/guice-servlet-3.0.jar:/usr/lib/hadoop-yarn/lib/aopalliance-1.0.jar:/usr/lib/hadoop-yarn/lib/commons-io-2.1.jar:/usr/lib/hadoop-yarn/lib/avro-1.5.4.jar:/usr/lib/hadoop-yarn/lib/paranamer-2.3.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-web-proxy-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-resourcemanager.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-common-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-server-nodemanager-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-api-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-common-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-applications-distributedshell.jar:/usr/lib/hadoop-yarn/.//hadoop-yarn-site-2.0.0-cdh4.0.1.jar:/usr/lib/hadoop-mapreduce/.//* STARTUP_MSG: build = file:///var/lib/jenkins/workspace/generic-package-ubuntu64-12-04/CDH4.0.1-Packaging-Hadoop-2012-06-28_17-01-57/hadoop-2.0.0+91-1.cdh4.0.1.p0.1~precise/src/hadoop-common-project/hadoop-common -r 4d98eb718ec0cce78a00f292928c5ab6e1b84695; compiled by 'jenkins' on Thu Jun 28 17:39:19 PDT 2012 ************************************************************/ 12/09/10 14:41:10 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-namenode.properties,hadoop-metrics2.properties hdfs-site.xml: hb@localhost:/etc/hadoop/conf$ cat hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?>  <configuration> <property> <name>dfs.https.address</name> <value>localhost:50470</value> </property> <property> <name>dfs.https.port</name> <value>50470</value> </property> <property> <name>dfs.namenode.http-address</name> <value>localhost:50070</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.blocksize</name> <value>134217728</value> </property> <property> <name>dfs.client.use.datanode.hostname</name> <value>false</value> </property> </configuration>

Read the article
Cascading S3 Sink Tap not being deleted with SinkMode.REPLACE

- by Eric Charles

We are running Cascading with a Sink Tap being configured to store in Amazon S3 and were facing some FileAlreadyExistsException (see [1]). This was only from time to time (1 time on around 100) and was not reproducable. Digging into the Cascading codem, we discovered the Hfs.deleteResource() is called (among others) by the BaseFlow.deleteSinksIfNotUpdate(). Btw, we were quite intrigued with the silent NPE (with comment "hack to get around npe thrown when fs reaches root directory"). From there, we extended the Hfs tap with our own Tap to add more action in the deleteResource() method (see [2]) with a retry mechanism calling directly the getFileSystem(conf).delete. The retry mechanism seemed to bring improvement, but we are still sometimes facing failures (see example in [3]): it sounds like HDFS returns isDeleted=true, but asking directly after if the folder exists, we receive exists=true, which should not happen. Logs also shows randomly isDeleted true or false when the flow succeeds, which sounds like the returned value is irrelevant or not to be trusted. Can anybody bring his own S3 experience with such a behavior: "folder should be deleted, but it is not"? We suspect a S3 issue, but could it also be in Cascading or HDFS? We run on Hadoop Cloudera-cdh3u5 and Cascading 2.0.1-wip-dev. [1] org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory s3n://... already exists at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:132) at com.twitter.elephantbird.mapred.output.DeprecatedOutputFormatWrapper.checkOutputSpecs(DeprecatedOutputFormatWrapper.java:75) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:923) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:882) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:882) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:856) at cascading.flow.hadoop.planner.HadoopFlowStepJob.internalNonBlockingStart(HadoopFlowStepJob.java:104) at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:174) at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:137) at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:122) at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:42) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.j [2] @Override public boolean deleteResource(JobConf conf) throws IOException { LOGGER.info("Deleting resource {}", getIdentifier()); boolean isDeleted = super.deleteResource(conf); LOGGER.info("Hfs Sink Tap isDeleted is {} for {}", isDeleted, getIdentifier()); Path path = new Path(getIdentifier()); int retryCount = 0; int cumulativeSleepTime = 0; int sleepTime = 1000; while (getFileSystem(conf).exists(path)) { LOGGER .info( "Resource {} still exists, it should not... - I will continue to wait patiently...", getIdentifier()); try { LOGGER.info("Now I will sleep " + sleepTime / 1000 + " seconds while trying to delete {} - attempt: {}", getIdentifier(), retryCount + 1); Thread.sleep(sleepTime); cumulativeSleepTime += sleepTime; sleepTime *= 2; } catch (InterruptedException e) { e.printStackTrace(); LOGGER .error( "Interrupted while sleeping trying to delete {} with message {}...", getIdentifier(), e.getMessage()); throw new RuntimeException(e); } if (retryCount == 0) { getFileSystem(conf).delete(getPath(), true); } retryCount++; if (cumulativeSleepTime > MAXIMUM_TIME_TO_WAIT_TO_DELETE_MS) { break; } } if (getFileSystem(conf).exists(path)) { LOGGER .error( "We didn't succeed to delete the resource {}. Throwing now a runtime exception.", getIdentifier()); throw new RuntimeException( "Although we waited to delete the resource for " + getIdentifier() + ' ' + retryCount + " iterations, it still exists - This must be an issue in the underlying storage system."); } return isDeleted; } [3] INFO [pool-2-thread-15] (BaseFlow.java:1287) - [...] at least one sink is marked for delete INFO [pool-2-thread-15] (BaseFlow.java:1287) - [...] sink oldest modified date: Wed Dec 31 23:59:59 UTC 1969 INFO [pool-2-thread-15] (HiveSinkTap.java:148) - Now I will sleep 1 seconds while trying to delete s3n://... - attempt: 1 INFO [pool-2-thread-15] (HiveSinkTap.java:130) - Deleting resource s3n://... INFO [pool-2-thread-15] (HiveSinkTap.java:133) - Hfs Sink Tap isDeleted is true for s3n://... ERROR [pool-2-thread-15] (HiveSinkTap.java:175) - We didn't succeed to delete the resource s3n://... Throwing now a runtime exception. WARN [pool-2-thread-15] (Cascade.java:706) - [...] flow failed: ... java.lang.RuntimeException: Although we waited to delete the resource for s3n://... 0 iterations, it still exists - This must be an issue in the underlying storage system. at com.qubit.hive.tap.HiveSinkTap.deleteResource(HiveSinkTap.java:179) at com.qubit.hive.tap.HiveSinkTap.deleteResource(HiveSinkTap.java:40) at cascading.flow.BaseFlow.deleteSinksIfNotUpdate(BaseFlow.java:971) at cascading.flow.BaseFlow.prepare(BaseFlow.java:733) at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:761) at cascading.cascade.Cascade$CascadeJob.call(Cascade.java:710) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)

Read the article

Search Results

Search found 196 results on 8 pages for 'mapreduce'.

Page 8/8 | < Previous Page | 4 5 6 7 8

- by Pinal Dave

- by Simon Elliston Ball

- by thegreeneman

- by user1419563

- by tgamblin

- by Ganesh

- by Neil Kodner

- by Brian

- by Paul Stovell

- by Leonard

- by Legend

- by Lucas

- by giorgiosironi

- by Pinal Dave

- by dan.mcclary

- by Pinal Dave

- by Eric Bezille

- by Madhu Nair

- by KarmicDice

- by Eric Charles

< Previous Page | 4 5 6 7 8