cluster analysis - Page 49

I'm thinking n+1 in a hyper-v r2 cluster managed by scvmm is not a great idea anymore

- by tony roth

Around here the clusters (not hyper-v clusters) are typically configured as n+1, so they are asking me to create a n+1 hyper-v r2 clusters. These will configured with both csv's and live migration and managed via scvmm r2. My thinking is that its a waste in having a node sitting there idle. In my opinion it would be better to have headroom left over for what would traditionally the +1 server spread amongst the N nodes. Anybody have an opinion on this. thanks

Read the article

SC10 Now Accepting Submissions

Student Cluster Competition submissions also open Parallel Computing - Beowulf - Competitions - Weapons - Warfare and Conflict

Read the article

eSTEP TechCast - December 2012

- by uwes

Dear partner, we are pleased to announce our next eSTEP TechCast on Thursday 6th of December and would be happy if you could join. Please see below the details for the next TechCast.Date and time:Thursday, 06. December 2012, 11:00 - 12:00 GMT (12:00 - 13:00 CET; 15:00 - 16:00 GST) Title: Innovations with Oracle Solaris Cluster 4 Abstract:Oracle Solaris Cluster 4.0 is the version of Solaris Cluster that runs with Oracle Solaris 11. In this webcast we will focus at the integration of the cluster software with the IPS packaging system of Solaris 11, which makes installing and updating the software much easier and much more reliable, especially with virtualization technologies involved. Our webcast will also reflect new versions of Oracle Solaris Cluster if they will be announced in the meantime. Target audience: Tech Presales Speaker: Hartmut Streppel Call Info:Call-in-toll-free number: 08006948154 (United Kingdom)Call-in-toll-free number: +44-2081181001 (United Kingdom) Show global numbers Conference Code: 803 594 3Security Passcode: 9876Webex Info (Oracle Web Conference) Meeting Number: 255 760 510Meeting Password: tech2011 Playback / Recording / Archive: The webcasts will be recorded and will be available shortly after the event in the eSTEP portal under the Events tab, where you could find also material from already delivered eSTEP TechCasts. Use your email-adress and PIN: eSTEP_2011 to get access. Feel free to have a look. We are happy to get your comments and feedback. Thanks and best regards, Partner HW Enablement EMEA

Read the article

What are best monitoring tool customizable for cluster / distributed system?

- by Adil

I am working on a system having multiple servers. I am interested in monitoring some server specific data like CPU/memory usage, disk/filesystem usage, network traffic, system load etc. and some other my process specific data. What are available open source that can serve my purpose? If it provides to customize the parameter to be monitored and monitor your own data by creating plugin / agent. Any suggestions? I heard of Nagios, Zabbix and Pandora but not sure if they provide such interface.

Read the article

Impact of Server Failure on Coherence Request Processing

- by jpurdy

Requests against a given cache server may be temporarily blocked for several seconds following the failure of other cluster members. This may cause issues for applications that can not tolerate multi-second response times even during failover processing (ignoring for the moment that in practice there are a variety of issues that make such absolute guarantees challenging even when there are no server failures). In general, Coherence is designed around the principle that failures in one member should not affect the rest of the cluster if at all possible. However, it's obvious that if that failed member was managing a piece of state that another member depends on, the second member will need to wait until a new member assumes responsibility for managing that state. This transfer of responsibility is (as of Coherence 3.7) performed by the primary service thread for each cache service. The finest possible granularity for transferring responsibility is a single partition. So the question becomes how to minimize the time spent processing each partition. Here are some optimizations that may reduce this period: Reduce the size of each partition (by increasing the partition count) Increase the number of JVMs across the cluster (increasing the total number of primary service threads) Increase the number of CPUs across the cluster (making sure that each JVM has a CPU core when needed) Re-evaluate the set of configured indexes (as these will need to be rebuilt when a partition moves) Make sure that the backing map is as fast as possible (in most cases this means running on-heap) Make sure that the cluster is running on hardware with fast CPU cores (since the partition processing is single-threaded) As always, proper testing is required to make sure that configuration changes have the desired effect (and also to quantify that effect).

Read the article

What is Google Page Rank and Why is it So Important?

What exactly is PageRank? It is basically a link analysis algorithm, which was influenced by citation analysis, which dates way back to the fifties, when it was conceived by Eugene Garfield and later on by Massimo Marchiori. This link analysis algorithm essentially gives set of hyperlinked documents, where they are weighed in numerical form, and are given a number assignment between zero to ten.

Read the article

How failover should work in IIS cluster with Application Request Routing?

- by username

I have set up several servers with IIS and connected them to the load balancer - server with installed IIS Application Request Routing. I have created a server farm and added two servers. Then I stopped IIS on the first server and tried to open my web site. It returned me an error: 502 - Web server received an invalid response while acting as a gateway or proxy server. But if instead of stopping IIS I shut down the first server, I'm getting a response from the next server which is online. The question is, what the expected behaviour should be for failover with ARR, should it switch me to the next server if IIS is stopped and server is online?

Read the article

SSRS 2005 - Usability analysis - Is SSRS a good option for this scenario?

- by Sach

How practical is it to consider SSRS 2005 or SSRS 2008 as a reporting solution for a report that has to show reports with millions of records (records vary from 3 to 10 million)? Is there any threshold on the size of report in SSRS? How do I know that for a huge report, wheather SSRS will consume the whole memory and start paging the operations to disk or it will give a memory leak error? Even if I keep on increasing the memory how can I be sure that certain memory will be sufficient for such huge reports for the report server? All the above questions are haunting me because I have a dedicated report server with a decent hardware and OS configuration (8 processors, 8GB RAM, 64 bit OS and 64 bit SQL Server 2005). Still my report with around 2 million records is taking more than 6 minutes and going from one page to another takes 3 minutes!!! My datasource is on separate server and when I execute only the stored proc there, it returns the results in less than 2 minutes.

Read the article

How to analysis how many bytes each instruction takes in assembly?

- by Mask

0x004012d0 <main+0>: push %ebp 0x004012d1 <main+1>: mov %esp,%ebp 0x004012d3 <main+3>: sub $0x28,%esp If the address is not available,can we calculate it ourselves?

Read the article

How can the reliability of Software be checked through analysis?

- by goutham

How can we analyze the software reliability? How to check the reliabilty of any application or product?

Read the article

Know of any Java garbage collection log analysis tools?

- by braveterry

I'm looking for a tool or a script that will take the console log from my web app, parse out the garbage collection information and display it in a meaningful way. I'm starting up on a Sun Java 1.4.2 JVM with the following flags: -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails The log output looks like this: 54.736: [Full GC 54.737: [Tenured: 172798K->18092K(174784K), 2.3792658 secs] 257598K->18092K(259584K), [Perm : 20476K->20476K(20480K)], 2.4715398 secs] Making sense of a few hundred of these kinds of log entries would be much easier if I had a tool that would visually graph garbage collection trends.

Read the article

What are some good tools for performance analysis with asp.net apps?

- by Sarah Nasir

Are there some good tools that can analyze the performance issues in an asp.net application? i google and found a couple like dotTrace, stimulustechnology. have anyone used a better one? thanks

Read the article

How do I calculate a cosinor analysis in SPSS, HELP!

- by Jared

How can I put this into spss??? http://www.cbi.dongnocchi.it/glossary/Cosinor.html I am trying to calculate the MESOR for a cyclic pattern of circadian rhythm.

Read the article

How to do a cost-benefit analysis for platform-level features?

- by Callister Park

I work on a development team that works closely with Product Managers. There is mutual agreement between the developers and Product Managers that there should be a business case behind every feature the development team builds. My question is, what is an effective way to make a business case for platform-level features that have higher up front cost but will provide long term benefits? For example, the development team would like to implement a plug-in framework. There is the higher up-front cost to implement a plug-in framework but delivering the subsequent features as plug-ins will be cheaper in the long run. This is obvious to everyone including the Product Managers. Is there a standard/simple way to express the cost-benefits? Is there a simple way to visualize it with a graph?

Read the article

High availability virtual machines

- by Jeremy

I've been reading a lot about high availability virtualization, either via Hyper-V or VMWare. In that context, essentially high availabliity means that the VM is hosted by a closter of physical servers (nodes), so if one of the physical servers goes down, the VM can still be served by other physical servers. So far so good, the physical cluster and the VM itself are highly available. However if the service being provided, let's say SQL server, MSDTC, or any other service, are actually being provided by the VM image and the virtualized operating system. So I imagine that there is still a point of failure at the virtual layer that isn't accounted for. Something could happen within the virtual machine itself that the physican cluster can not account for, correct? In that instance the physican failover cluster (Hyper-V) or VMWare host, can not fail over, because the issue is not with one of the servers in the physical cluster - failing over a physical node would not do any good. Does this necessitate building a virtual failover cluster on top of the physical one, or is this not necessary? Alternatively, I suppose you could skip the phsyical clustering, and just cluster at the virtual layer (Child based failover clustering), because that should still survive a physical failure. See image below showing parent based (left), child based (right) and a combination (center). Is parent based as far as you need to go, or is child based more appropriate?

Read the article

NET Math Libraries

- by JoshReuben

NET Mathematical Libraries .NET Builder for Matlab The MathWorks Inc. - http://www.mathworks.com/products/netbuilder/ MATLAB Builder NE generates MATLAB based .NET and COM components royalty-free deployment creates the components by encrypting MATLAB functions and generating either a .NET or COM wrapper around them. .NET/Link for Mathematica www.wolfram.com a product that 2-way integrates Mathematica and Microsoft's .NET platform call .NET from Mathematica - use arbitrary .NET types directly from the Mathematica language. use and control the Mathematica kernel from a .NET program. turns Mathematica into a scripting shell to leverage the computational services of Mathematica. write custom front ends for Mathematica or use Mathematica as a computational engine for another program comes with full source code. Leverages MathLink - a Wolfram Research's protocol for sending data and commands back and forth between Mathematica and other programs. .NET/Link abstracts the low-level details of the MathLink C API. Extreme Optimization http://www.extremeoptimization.com/ a collection of general-purpose mathematical and statistical classes built for the.NET framework. It combines a math library, a vector and matrix library, and a statistics library in one package. download the trial of version 4.0 to try it out. Multi-core ready - Full support for Task Parallel Library features including cancellation. Broad base of algorithms covering a wide range of numerical techniques, including: linear algebra (BLAS and LAPACK routines), numerical analysis (integration and differentiation), equation solvers. Mathematics leverages parallelism using .NET 4.0's Task Parallel Library. Basic math: Complex numbers, 'special functions' like Gamma and Bessel functions, numerical differentiation. Solving equations: Solve equations in one variable, or solve systems of linear or nonlinear equations. Curve fitting: Linear and nonlinear curve fitting, cubic splines, polynomials, orthogonal polynomials. Optimization: find the minimum or maximum of a function in one or more variables, linear programming and mixed integer programming. Numerical integration: Compute integrals over finite or infinite intervals, over 2D and higher dimensional regions. Integrate systems of ordinary differential equations (ODE's). Fast Fourier Transforms: 1D and 2D FFT's using managed or fast native code (32 and 64 bit) BigInteger, BigRational, and BigFloat: Perform operations with arbitrary precision. Vector and Matrix Library Real and complex vectors and matrices. Single and double precision for elements. Structured matrix types: including triangular, symmetrical and band matrices. Sparse matrices. Matrix factorizations: LU decomposition, QR decomposition, singular value decomposition, Cholesky decomposition, eigenvalue decomposition. Portability and performance: Calculations can be done in 100% managed code, or in hand-optimized processor-specific native code (32 and 64 bit). Statistics Data manipulation: Sort and filter data, process missing values, remove outliers, etc. Supports .NET data binding. Statistical Models: Simple, multiple, nonlinear, logistic, Poisson regression. Generalized Linear Models. One and two-way ANOVA. Hypothesis Tests: 12 14 hypothesis tests, including the z-test, t-test, F-test, runs test, and more advanced tests, such as the Anderson-Darling test for normality, one and two-sample Kolmogorov-Smirnov test, and Levene's test for homogeneity of variances. Multivariate Statistics: K-means cluster analysis, hierarchical cluster analysis, principal component analysis (PCA), multivariate probability distributions. Statistical Distributions: 25 29 continuous and discrete statistical distributions, including uniform, Poisson, normal, lognormal, Weibull and Gumbel (extreme value) distributions. Random numbers: Random variates from any distribution, 4 high-quality random number generators, low discrepancy sequences, shufflers. New in version 4.0 (November, 2010) Support for .NET Framework Version 4.0 and Visual Studio 2010 TPL Parallellized – multicore ready sparse linear program solver - can solve problems with more than 1 million variables. Mixed integer linear programming using a branch and bound algorithm. special functions: hypergeometric, Riemann zeta, elliptic integrals, Frensel functions, Dawson's integral. Full set of window functions for FFT's. Product Price Update subscription Single Developer License $999 $399 Team License (3 developers) $1999 $799 Department License (8 developers) $3999 $1599 Site License (Unlimited developers in one physical location) $7999 $3199 NMath http://www.centerspace.net .NET math and statistics libraries matrix and vector classes random number generators Fast Fourier Transforms (FFTs) numerical integration linear programming linear regression curve and surface fitting optimization hypothesis tests analysis of variance (ANOVA) probability distributions principal component analysis cluster analysis built on the Intel Math Kernel Library (MKL), which contains highly-optimized, extensively-threaded versions of BLAS (Basic Linear Algebra Subroutines) and LAPACK (Linear Algebra PACKage). Product Price Update subscription Single Developer License $1295 $388 Team License (5 developers) $5180 $1554 DotNumerics http://www.dotnumerics.com/NumericalLibraries/Default.aspx free DotNumerics is a website dedicated to numerical computing for .NET that includes a C# Numerical Library for .NET containing algorithms for Linear Algebra, Differential Equations and Optimization problems. The Linear Algebra library includes CSLapack, CSBlas and CSEispack, ports from Fortran to C# of LAPACK, BLAS and EISPACK, respectively. Linear Algebra (CSLapack, CSBlas and CSEispack). Systems of linear equations, eigenvalue problems, least-squares solutions of linear systems and singular value problems. Differential Equations. Initial-value problem for nonstiff and stiff ordinary differential equations ODEs (explicit Runge-Kutta, implicit Runge-Kutta, Gear's BDF and Adams-Moulton). Optimization. Unconstrained and bounded constrained optimization of multivariate functions (L-BFGS-B, Truncated Newton and Simplex methods). Math.NET Numerics http://numerics.mathdotnet.com/ free an open source numerical library - includes special functions, linear algebra, probability models, random numbers, interpolation, integral transforms. A merger of dnAnalytics with Math.NET Iridium in addition to a purely managed implementation will also support native hardware optimization. constants & special functions complex type support real and complex, dense and sparse linear algebra (with LU, QR, eigenvalues, ... decompositions) non-uniform probability distributions, multivariate distributions, sample generation alternative uniform random number generators descriptive statistics, including order statistics various interpolation methods, including barycentric approaches and splines numerical function integration (quadrature) routines integral transforms, like fourier transform (FFT) with arbitrary lengths support, and hartley spectral-space aware sequence manipulation (signal processing) combinatorics, polynomials, quaternions, basic number theory. parallelized where appropriate, to leverage multi-core and multi-processor systems fully managed or (if available) using native libraries (Intel MKL, ACMS, CUDA, FFTW) provides a native facade for F# developers

Read the article

What's safe to assume about the NSMutableArray / NSArray class cluster?

- by andyvn22

I know you shouldn't use this to decide whether or not to change an array: if ([possiblyMutable isKindOfClass:[NSMutableArray class]]) But say I'm writing a method and need to return either an NSMutableArray or an NSArray, depending on the mutability of possiblyMutable. The class using my method already knows whether or not it's acceptable to change the returned array. Whether or not it's acceptable to change the returned array directly correlates with whether or not it's acceptable to change possiblyMutable. In that specific case, is this code safe? It seems to me that if it's not acceptable to change the array, but we accidentally get a mutable array, it's ok, because the class using my method won't try to change it. And if it is acceptable to change the array, then we will always get possiblyMutable as an NSMutableArray (though this is the part I'm not entirely clear on). So... safe or not? Alternatives?

Read the article

Force memcached to write to all servers in pool

- by Industrial

Hi everyone, I have thought a bit on how to make sure that a particular key is distributed to ALL memcached servers in a pool. My current, untested solution is to make another instance of memcached, something like this: $cluster['local'] = array('host' => '192.168.1.1', 'port' => '11211', 'weight' => 50); foreach ($this->cluster() as $cluster) { @$this->tempMemcache = new Memcache; @$this->tempMemcache->connect($cluster['host'], $cluster['port']); @$this->tempMemcache->set($key, $value, $this->compress, $expireTime); @$this->tempMemcache->close(); } What is common sense to do in this case, when certain keys need to be stored on ALL servers for reliability?

Read the article

Big Data – Basics of Big Data Analytics – Day 18 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the various components in Big Data Story. In this article we will understand what are the various analytics tasks we try to achieve with the Big Data and the list of the important tools in Big Data Story. When you have plenty of the data around you what is the first thing which comes to your mind? “What do all these data means?” Exactly – the same thought comes to my mind as well. I always wanted to know what all the data means and what meaningful information I can receive out of it. Most of the Big Data projects are built to retrieve various intelligence all this data contains within it. Let us take example of Facebook. When I look at my friends list of Facebook, I always want to ask many questions such as - On which date my maximum friends have a birthday? What is the most favorite film of my most of the friends so I can talk about it and engage them? What is the most liked placed to travel my friends? Which is the most disliked cousin for my friends in India and USA so when they travel, I do not take them there. There are many more questions I can think of. This illustrates that how important it is to have analysis of Big Data. Here are few of the kind of analysis listed which you can use with Big Data. Slicing and Dicing: This means breaking down your data into smaller set and understanding them one set at a time. This also helps to present various information in a variety of different user digestible ways. For example if you have data related to movies, you can use different slide and dice data in various formats like actors, movie length etc. Real Time Monitoring: This is very crucial in social media when there are any events happening and you wanted to measure the impact at the time when the event is happening. For example, if you are using twitter when there is a football match, you can watch what fans are talking about football match on twitter when the event is happening. Anomaly Predication and Modeling: If the business is running normal it is alright but if there are signs of trouble, everyone wants to know them early on the hand. Big Data analysis of various patterns can be very much helpful to predict future. Though it may not be always accurate but certain hints and signals can be very helpful. For example, lots of data can help conclude that if there is lots of rain it can increase the sell of umbrella. Text and Unstructured Data Analysis: unstructured data are now getting norm in the new world and they are a big part of the Big Data revolution. It is very important that we Extract, Transform and Load the unstructured data and make meaningful data out of it. For example, analysis of lots of images, one can predict that people like to use certain colors in certain months in their cloths. Big Data Analytics Solutions There are many different Big Data Analystics Solutions out in the market. It is impossible to list all of them so I will list a few of them over here. Tableau – This has to be one of the most popular visualization tools out in the big data market. SAS – A high performance analytics and infrastructure company IBM and Oracle – They have a range of tools for Big Data Analysis Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Data Scientist. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article

SQL SERVER – What are Actions in SSAS and How to Make a Reporting Action

- by Pinal Dave

Actions are used for customized browsing and drilling of data for the end-user. It’s an event that a user can raise while accessing the cube data. They are used in cube browsers like excel and are triggered when a user in a client tool clicks on a particular member, level, dimension, cells or may be the cube itself. For example a user might be able to see a reporting services report, open a web page or drill through to detailed information related to the cube data. Analysis server supports 3 types of actions :- Report Drill-through Standard Actions In this blog post, I will explain the Reporting action. The objective of this action is to return a report with details of the product where the sales amount is greater than 1000 in cube browser analysis. You need to create a basic cube first with the facts and dimensions you want in the analysis. Following are the steps to create reporting action. Go to SQL server data tools and open the analysis services project. Navigate to actions and click on new reporting action. 2.) Specify the name of the action and choose target type as attribute members since we have to create the action on members for a attribute. 3.) Specify the Target object of your report action. Target object would be the dimension or attribute on which you want the report to appear. In our case it is product name. 4.) Next you have to define the condition on which you want the report link to appear. However, this is an optional feature. In this example we are specifying a condition, which will check if the sales amount is greater than 10,000. So, that the link appears only for those products where the defined condition is met. 5.) Next you have to specify the server name on which the report is present, report path and the report format in which you want the report to appear. 6.) Additionally you can specify the parameters. As with conditional expression, the parameters should be a valid MDX expression. The parameter name should be same as the one defined in the report. 7.) Deploy your solution after you are done with specifying parameters and go to the cube browser. 8.) Click on the analyze in excel button, this will open your cube in excel 9.) Make an analysis which shows product names and their sales amount. 10.) Right click on a product where sales amount is greater than 10000 you will see the reporting action link. Click on that and you will be taken to your reporting services report. 11.) Clicking on the link will take you to the URL of the report. I created this report using report project wizard in SQL server data tools. So, this is how we can launch reports from a cube browser. Similarly you can open web pages, run applications and a number of other tasks. Koenig Solutions offers SSAS training which contains all Analysis Services including Reporting in great detail. In my next blog post I will talk about drill-through actions. Author: Namita Sharma, Senior Corporate Trainer at Koenig Solutions. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: SSAS

Read the article

Some OBI EE Tricks and Tips in the Admin Tool By Gerry Langton

- by hamsun

How to set the log level from a Session variable Initialization block As we know it is normal to set the log level non-zero for a particular user when we wish to debug problems. However sometimes it is inconvenient to go into each user’s properties in the Admin tool and update the log level. So I am showing a method which allows the log level to be set for all users via a session initialization block. This is particularly useful for anyone wanting an alternative way to set the log level. The screen shots shown are using the OBIEE 11g SampleApp demo but are applicable to any environment. Open the appropriate rpd in on-line mode and navigate to Manage Variables. Select Session Initialization Blocks, right click in the white space and create a New Initialization Block. I called the Initialization block Set_Loglevel . Now click on ‘Edit Data Source’ to enter the SQL. Chose the ‘Use OBI EE Server’ option for the SQL. This means that the SQL provided must use tables which have been defined in the Physical layer of the RPD, and whilst there is no need to provide a connection pool you must work in On-Line mode. The SQL can access any of the RPD tables and is purely used to return a value of 2. The ‘Test’ button confirms that the SQL is valid. Next, click on the ‘Edit Data Target’ button to add the LOGLEVEL variable to the initialization block. Check the ‘Enable any user to set the value’ option so that this will work for any user. Click OK and the following message will display as LOGLEVEL is a system session variable: Click ‘Yes’. Click ‘OK’ to save the Initialization block. Then check in the On-LIne changes. To test that LOGLEVEL has been set, log in to OBIEE using an administrative login (e.g. weblogic) and reload server metadata, either from the Analysis editor or from Administration > Reload Files and Metadata link. Run a query then navigate to Administration > Manage Sessions and click ‘View Log’ for the query just issued (which should be approximately the last in the list). A log file should exist and with LOGLEVEL set to 2 should include both logical and physical sql. If more diagnostic information is required then set LOGLEVEL to a higher value. If logging is required only for a particular analysis then an alternative method can be used directly from the Analysis editor. Edit the analysis for which debugging is required and click on the Advanced tab. Scroll down to the Advanced SQL clauses section and enter the following in the Prefix box: SET VARIABLE LOGLEVEL = 2; Click the ‘Apply SQL’ button. The SET VARIABLE statement will now prefix the Analysis’s logical SQL. So that any time this analysis is run it will produce a log. You can find information about training for Oracle BI EE products here or in the OU Learning Paths. Please send me an email at [email protected] if you have any further questions. About the Author: Gerry Langton started at Siebel Systems in 1999 working as a technical instructor teaching both Siebel application development and also Siebel Analytics (which subsequently became Oracle BI EE). From 2006 Gerry has worked as Senior Principal Instructor within Oracle University specialising in Oracle BI EE, Oracle BI Publisher and Oracle Data Warehouse development for BI.

Read the article

Big Data – Interacting with Hadoop – What is Sqoop? – What is Zookeeper? – Day 17 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the Pig and Pig Latin in Big Data Story. In this article we will understand what is Sqoop and Zookeeper in Big Data Story. There are two most important components one should learn when learning about interacting with Hadoop – Sqoop and Zookper. What is Sqoop? Most of the business stores their data in RDBMS as well as other data warehouse solutions. They need a way to move data to the Hadoop system to do various processing and return it back to RDBMS from Hadoop system. The data movement can happen in real time or at various intervals in bulk. We need a tool which can help us move this data from SQL to Hadoop and from Hadoop to SQL. Sqoop (SQL to Hadoop) is such a tool which extract data from non-Hadoop data sources and transform them into the format which Hadoop can use it and later it loads them into HDFS. Essentially it is ETL tool where it Extracts, Transform and Load from SQL to Hadoop. The best part is that it also does extract data from Hadoop and loads them to Non-SQL (or RDBMS) data stores. Essentially, Sqoop is a command line tool which does SQL to Hadoop and Hadoop to SQL. It is a command line interpreter. It creates MapReduce job behinds the scene to import data from an external database to HDFS. It is very effective and easy to learn tool for nonprogrammers. What is Zookeeper? ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. In other words Zookeeper is a replicated synchronization service with eventual consistency. In simpler words – in Hadoop cluster there are many different nodes and one node is master. Let us assume that master node fails due to any reason. In this case, the role of the master node has to be transferred to a different node. The main role of the master node is managing the writers as that task requires persistence in order of writing. In this kind of scenario Zookeeper will assign new master node and make sure that Hadoop cluster performs without any glitch. Zookeeper is the Hadoop’s method of coordinating all the elements of these distributed systems. Here are few of the tasks which Zookeepr is responsible for. Zookeeper manages the entire workflow of starting and stopping various nodes in the Hadoop’s cluster. In Hadoop cluster when any processes need certain configuration to complete the task. Zookeeper makes sure that certain node gets necessary configuration consistently. In case of the master node fails, Zookeepr can assign new master node and make sure cluster works as expected. There many other tasks Zookeeper performance when it is about Hadoop cluster and communication. Basically without the help of Zookeeper it is not possible to design any new fault tolerant distributed application. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Big Data Analytics. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article

Know your Data Lineage

- by Simon Elliston Ball

An academic paper without the footnotes isn’t an academic paper. Journalists wouldn’t base a news article on facts that they can’t verify. So why would anyone publish reports without being able to say where the data has come from and be confident of its quality, in other words, without knowing its lineage. (sometimes referred to as ‘provenance’ or ‘pedigree’) The number and variety of data sources, both traditional and new, increases inexorably. Data comes clean or dirty, processed or raw, unimpeachable or entirely fabricated. On its journey to our report, from its source, the data can travel through a network of interconnected pipes, passing through numerous distinct systems, each managed by different people. At each point along the pipeline, it can be changed, filtered, aggregated and combined. When the data finally emerges, how can we be sure that it is right? How can we be certain that no part of the data collection was based on incorrect assumptions, that key data points haven’t been left out, or that the sources are good? Even when we’re using data science to give us an approximate or probable answer, we cannot have any confidence in the results without confidence in the data from which it came. You need to know what has been done to your data, where it came from, and who is responsible for each stage of the analysis. This information represents your data lineage; it is your stack-trace. If you’re an analyst, suspicious of a number, it tells you why the number is there and how it got there. If you’re a developer, working on a pipeline, it provides the context you need to track down the bug. If you’re a manager, or an auditor, it lets you know the right things are being done. Lineage tracking is part of good data governance. Most audit and lineage systems require you to buy into their whole structure. If you are using Hadoop for your data storage and processing, then tools like Falcon allow you to track lineage, as long as you are using Falcon to write and run the pipeline. It can mean learning a new way of running your jobs (or using some sort of proxy), and even a distinct way of writing your queries. Other Hadoop tools provide a lot of operational and audit information, spread throughout the many logs produced by Hive, Sqoop, MapReduce and all the various moving parts that make up the eco-system. To get a full picture of what’s going on in your Hadoop system you need to capture both Falcon lineage and the data-exhaust of other tools that Falcon can’t orchestrate. However, the problem is bigger even that that. Often, Hadoop is just one piece in a larger processing workflow. The next step of the challenge is how you bind together the lineage metadata describing what happened before and after Hadoop, where ‘after’ could be a data analysis environment like R, an application, or even directly into an end-user tool such as Tableau or Excel. One possibility is to push as much as you can of your key analytics into Hadoop, but would you give up the power, and familiarity of your existing tools in return for a reliable way of tracking lineage? Lineage and auditing should work consistently, automatically and quietly, allowing users to access their data with any tool they require to use. The real solution, therefore, is to create a consistent method by which to bring lineage data from these data various disparate sources into the data analysis platform that you use, rather than being forced to use the tool that manages the pipeline for the lineage and a different tool for the data analysis. The key is to keep your logs, keep your audit data, from every source, bring them together and use the data analysis tools to trace the paths from raw data to the answer that data analysis provides.

Read the article

Consolidation in a Database Cloud

- by B R Clouse

Consolidation of multiple databases onto a shared infrastructure is the next step after Standardization. The potential consolidation density is a function of the extent to which the infrastructure is shared. The three models provide increasing degrees of sharing: Server: each database is deployed in a dedicated VM. Hardware is shared, but most of the software infrastructure is not. Standardization is often applied incompletely since operating environments can be moved as-is onto the shared platform. The potential for VM sprawl is an additional downside. Database: multiple database instances are deployed on a shared software / hardware infrastructure. This model is very efficient and easily implemented with the features in the Oracle Database and supporting products. Many customers have moved to this model and achieved significant, measurable benefits. Schema: multiple schemas are deployed within a single database instance. The most efficient model, it places constraints on the environment. Usually this model will be implemented only by customers deploying their own applications. (Note that a single deployment can combine Database and Schema consolidations.) Customer value: lower costs, better system utilization In this phase of the maturity model, under-utilized hardware can be used to host more workloads, or retired and those workloads migrated to consolidation platforms. Customers benefit from higher utilization of the hardware resources, resulting in reduced data center floor space, and lower power and cooling costs. And, the OpEx savings from Standardization are multiplied, since there are fewer physical components (both hardware and software) to manage. Customer value: higher productivity The OpEx benefits from Standardization are compounded since not only are there fewer types of things to manage, now there are fewer entities to manage. In this phase, customers discover that their IT staff has time to move away from "day-to-day" tasks and start investing in higher value activities. Database users benefit from consolidating onto shared infrastructures by relieving themselves of the requirement to maintain their own dedicated servers. Also, if the shared infrastructure offers capabilities such as High Availability / Disaster Recovery, which are often beyond the budget and skillset of a standalone database environment, then moving to the consolidation platform can provide access to those capabilities, resulting in less downtime. Capabilities / Characteristics In this phase, customers will typically deploy fixed-size clusters and consolidate on a cluster until that cluster is deemed "full," at which point a new cluster is built. Customers will define one or a few cluster architectures that are used wherever possible; occasionally there may be deployments which must be handled as exceptions. The "full" policy may be based on number of databases deployed on the cluster, or observed peak workload, etc. IT will own the provisioning of new databases on a cluster, making the decision of when and where to place new workloads. Resources may be managed dynamically, e.g., as a priority workload increases, it may be given more CPU and memory to handle the spike. Users will be charged at a fixed, relatively coarse level; or in some cases, no charging will be applied. Activities / Tasks Oracle offers several tools to plan a successful consolidation. Real Application Testing (RAT) has a feature to help plan and validate database consolidations. Enterprise Manager 12c's Cloud Management Pack for Database includes a planning module. Looking ahead, customers should start planning for the Services phase by defining the Service Catalog that will be made available for database services.

Read the article

XNA running slow when making a texture

- by Anthony

I'm using XNA to test an image analysis algorithm for a robot. I made a simple 3D world that has a grass, a robot, and white lines (that are represent the course). The image analysis algorithm is a modification of the Hough line detection algorithm. I have the game render 2 camera views to a render target in memory. One camera is a top down view of the robot going around the course, and the second camera is the view from the robot's perspective as it moves along. I take the rendertarget of the robot camera and convert it to a Color[,] so that I can do image analysis on it. private Color[,] TextureTo2DArray(Texture2D texture, Color[] colors1D, Color[,] colors2D) { texture.GetData(colors1D); for (int x = 0; x < texture.Width; x++) { for (int y = 0; y < texture.Height; y++) { colors2D[x, y] = colors1D[x + (y * texture.Width)]; } } return colors2D; } I want to overlay the results of the image analysis on the robot camera view. The first part of the image analysis is finding the white pixels. When I find the white pixels I create a bool[,] array showing which pixels were white and which were black. Then I want to convert it back into a texture so that I can overlay on the robot view. When I try to create the new texture showing which ones pixels were white, then the game goes super slow (around 10 hz). Can you give me some pointers as to what to do to make the game go faster. If I comment out this algorithm, then it goes back up to 60 hz. private Texture2D GenerateTexturesFromBoolArray(bool[,] boolArray,Color[] colorMap, Texture2D textureToModify) { for(int i =0;i < screenWidth;i++) { for(int j =0;j<screenHeight;j++) { if (boolArray[i, j] == true) { colorMap[i+(j*screenWidth)] = Color.Red; } else { colorMap[i + (j * screenWidth)] = Color.Transparent; } } } textureToModify.SetData<Color>(colorMap); return textureToModify; } Each Time I run draw, I must set the texture to null, so that I can modify it. public override void Draw(GameTime gameTime) { Vector2 topRightVector = ((SimulationMain)Game).spriteRectangleManager.topRightVector; Vector2 scaleFactor = ((SimulationMain)Game).config.scaleFactorScreenSizeToWindow; this.spriteBatch.Begin(); // Start the 2D drawing this.spriteBatch.Draw(this.textureFindWhite, topRightVector, null, Color.White, 0, Vector2.Zero, scaleFactor, SpriteEffects.None, 0); this.spriteBatch.End(); // Stop drawing. GraphicsDevice.Textures[0] = null; } Thanks for the help, Anthony G.

Search Results

Search found 4291 results on 172 pages for 'cluster analysis'.

Page 49/172 | < Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56 | Next Page >

- by tony roth

- by uwes

- by Adil

- by jpurdy

- by username

- by Sach

- by Mask

- by goutham

- by braveterry

- by Sarah Nasir

- by Jared

- by Callister Park

- by Jeremy

- by JoshReuben

- by andyvn22

- by Industrial

- by Pinal Dave

- by Pinal Dave

- by hamsun

- by Pinal Dave

- by Simon Elliston Ball

- by B R Clouse

- by Anthony

< Previous Page | 45 46 47 48 49 50 51 52 53 54 55 56 | Next Page >