percona xtradb cluster - Page 45

How can I implement MapReduce using shell commands?

- by alex

How do you execute a Unix shell command (e.g awk one liner) on a cluster in parallel (step 1) and collect the results back to a central node (step 2)? Update: I've just found http://blog.last.fm/2009/04/06/mapreduce-bash-script It seems to do exactly what I need.

Read the article

Scipy Negative Distance? What?

- by disappearedng

I have a input file which are all floating point numbers to 4 decimal place. i.e. 13359 0.0000 0.0000 0.0001 0.0001 0.0002` 0.0003 0.0007 ... (the first is the id). My class uses the loadVectorsFromFile method which multiplies it by 10000 and then int() these numbers. On top of that, I also loop through each vector to ensure that there are no negative values inside. However, when I perform _hclustering, I am continually seeing the error, "Linkage Z contains negative values". I seriously think this is a bug because: I checked my values, the values are no where small enough or big enough to approach the limits of the floating point numbers and the formula that I used to derive the values in the file uses absolute value (my input is DEFINITELY right). Can someone enligten me as to why I am seeing this weird error? What is going on that is causing this negative distance error? ===== def loadVectorsFromFile(self, limit, loc, assertAllPositive=True, inflate=True): """Inflate to prevent "negative" distance, we use 4 decimal points, so *10000 """ vectors = {} self.winfo("Each vector is set to have %d limit in length" % limit) with open( loc ) as inf: for line in filter(None, inf.read().split('\n')): l = line.split('\t') if limit: scores = map(float, l[1:limit+1]) else: scores = map(float, l[1:]) if inflate: vectors[ l[0]] = map( lambda x: int(x*10000), scores) #int might save space else: vectors[ l[0]] = scores if assertAllPositive: #Assert that it has no negative value for dirID, l in vectors.iteritems(): if reduce(operator.or_, map( lambda x: x < 0, l)): self.werror( "Vector %s has negative values!" % dirID) return vectors def main( self, inputDir, outputDir, limit=0, inFname="data.vectors.all", mappingFname='all.id.features.group.intermediate'): """ Loads vector from a file and start clustering INPUT vectors is { featureID: tfidfVector (list), } """ IDFeatureDic = loadIdFeatureGroupDicFromIntermediate( pjoin(self.configDir, mappingFname)) if not os.path.exists(outputDir): os.makedirs(outputDir) vectors = self.loadVectorsFromFile( limit, pjoin( inputDir, inFname)) for threshold in map( lambda x:float(x)/30, range(20,30)): clusters = self._hclustering(threshold, vectors) if clusters: outputLoc = pjoin(outputDir, "threshold.%s.result" % str(threshold)) with open(outputLoc, 'w') as outf: for clusterNo, cluster in clusters.iteritems(): outf.write('%s\n' % str(clusterNo)) for featureID in cluster: feature, group = IDFeatureDic[featureID] outline = "%s\t%s\n" % (feature, group) outf.write(outline.encode('utf-8')) outf.write("\n") else: continue def _hclustering(self, threshold, vectors): """function which you should call to vary the threshold vectors: { featureID: [ tfidf scores, tfidf score, .. ] """ clusters = defaultdict(list) if len(vectors) > 1: try: results = hierarchy.fclusterdata( vectors.values(), threshold, metric='cosine') except ValueError, e: self.werror("_hclustering: %s" % str(e)) return False for i, featureID in enumerate( vectors.keys()):

Read the article

sqlite over network share for failover

- by reinier

As a follow-up of this question: sqlite-over-a-network-share If I put the SQlite DB on a network share, but will not access it concurrently from different machines. I only have the SQLite db stored on a share so a cluster of failover computers can take over where one machine left off. Are there any inherent problems with that approach?

Read the article

Run Linux command as predefined user

- by vijay.shad

Hi all, I have created a shell script to start a server program. startup.sh start When above command will executes, it will try starts the server as adminuser. To achieve this my script has written like this. SUBIT="su - adminuser -c " SERVER_BOX_COMMAND_A="Server" ############## # Function to start cluster function start(){ $SUBIT "$SERVER_BOX_COMMAND_A" } When i execute the command it asks for password. Is there any other way to do this so, it will not ask for password.

Read the article

Can I use an NSDecimalNumber anywhere that an NSNumber is expected?

- by Nick Forge

NSDecimalNumber is a subclass of NSNumber, and from what I can tell, it implements all of the NSNumber methods as expected for an NSNumber instance. Given that, is it ok to give NSDecimalNumbers to any code that is expecting an NSNumber? The only possible issue might be code that checks that an argument is an instance of NSNumber, but since NSNumber is a class-cluster, code like this would have to check that the instance is a subclass of NSNumber, and NSDecimalNumber instances should pass the same tests.

Read the article

Jboss logging issue

- by balaji

I'm Working as deployer and server administrator. We use Jboss 4.0x AS to deploy our applications. The issue I'm facing is, Whenever we redeploy/restart the server, server.log is getting created but after sometime the logging goes off. Yes it is not at all updating the server.log file. Due to this, we could not trace the other critical issues we have. Actually we have two separate nodes and we do deploy/restarting the server separately on two nodes. We are facing the issue in both of our test and production environment. I could not trace out where exactly the issue is. Could you please help me in resolving the issue? If we have any other issues, we can check the log files. If log itself is not getting updated/logged, how can we move further in analyzing the issues without the recent/updated logs? Below are the logs found in the stdout.log: 18:55:50,303 INFO [Server] Core system initialized 18:55:52,296 INFO [WebService] Using RMI server codebase: http://kl121tez.is.klmcorp.net:8083/ 18:55:52,313 INFO [Log4jService$URLWatchTimerTask] Configuring from URL: resource:log4j.xml 18:55:52,860 ERROR [STDERR] LOG0026E The Log Manager cannot create the object AmasRBPFTraceLogger without a class name. 18:55:52,860 ERROR [STDERR] LOG0026E The Log Manager cannot create the object AmasRBPFMessageLogger without a class name. 18:55:54,273 ERROR [STDERR] LOG0026E The Log Manager cannot create the object AmasCacheTraceLogger without a class name. 18:55:54,274 ERROR [STDERR] LOG0026E The Log Manager cannot create the object AmasCacheMessageLogger without a class name. 18:55:54,334 ERROR [STDERR] LOG0026E The Log Manager cannot create the object JACCTraceLogger without a class name. 18:55:54,334 ERROR [STDERR] LOG0026E The Log Manager cannot create the object JACCMessageLogger without a class name. 18:55:56,059 INFO [ServiceEndpointManager] WebServices: jbossws-1.0.3.SP1 (date=200609291417) 18:55:56,635 INFO [Embedded] Catalina naming disabled 18:55:56,671 INFO [ClusterRuleSetFactory] Unable to find a cluster rule set in the classpath. Will load the default rule set. 18:55:56,672 INFO [ClusterRuleSetFactory] Unable to find a cluster rule set in the classpath. Will load the default rule set. 18:55:56,843 INFO [Http11BaseProtocol] Initializing Coyote HTTP/1.1 on http-0.0.0.0-8180 18:55:56,844 INFO [Catalina] Initialization processed in 172 ms 18:55:56,844 INFO [StandardService] Starting service jboss.web

Read the article

Using Java for 'nslookup -type=srv'

- by user321524

Hi: Is there a way in Java to do a Nslookup search on SRV records? I need to bind to Active Directory and would like to use the SRV records to help determine which of the ADs in the cluster are live. In command line the nslookup search string is: 'nslookup -type=srv _ ldap._tcp.ActiveDirectory domain name' Thanks.

Read the article

Google maps: place number in marker?

- by User

How can I display a number in the marker on a google map? I want to do server side clustering and I need to display how many points the cluster represents.

Read the article

Enthought Python, Sage, or others (in Unix clusters)

- by vailen

I am currently get access to a cluster of Unix machines, but they don't have the software I need (numpy, scipy, matplotlib, etc), and I have to install them by myself (I don't have the root permission, either, so commands like apt-get or yast doesn't work). In the worst case, I have to compile them all from source. Is there any better way to do so? I hear something about Enthought Python and Sage, but not sure what is the best way to do so. Any suggestion?

Read the article

Production system running JBoss clustering

- by portoalet

Hi, What are some of the largest systems that use JBoss clustering? What are the specs/config? I want to know whether JBoss cluster is really scalable or not. Thanks

Read the article

Can J2SE applications be clustered

- by ebe

Hi there all, i have numerous Corba servers and some other java apps (not web). I wish to cluster them if possible. Can this be done ? Many thanks

Read the article

How do I print out objects in an array in python?

- by Jonathan

I'm writing a code which performs a k-means clustering on a set of data. I'm actually using the code from a book called collective intelligence by O'Reilly. Everything works, but in his code he uses the command line and i want to write everything in notepad++. As a reference his line is >>>kclust=clusters.kcluster(data,k=10) >>>[rownames[r] for r in k[0]] Here is my code: from PIL import Image,ImageDraw def readfile(filename): lines=[line for line in file(filename)] # First line is the column titles colnames=lines[0].strip( ).split('\t')[1:] rownames=[] data=[] for line in lines[1:]: p=line.strip( ).split('\t') # First column in each row is the rowname rownames.append(p[0]) # The data for this row is the remainder of the row data.append([float(x) for x in p[1:]]) return rownames,colnames,data from math import sqrt def pearson(v1,v2): # Simple sums sum1=sum(v1) sum2=sum(v2) # Sums of the squares sum1Sq=sum([pow(v,2) for v in v1]) sum2Sq=sum([pow(v,2) for v in v2]) # Sum of the products pSum=sum([v1[i]*v2[i] for i in range(len(v1))]) # Calculate r (Pearson score) num=pSum-(sum1*sum2/len(v1)) den=sqrt((sum1Sq-pow(sum1,2)/len(v1))*(sum2Sq-pow(sum2,2)/len(v1))) if den==0: return 0 return 1.0-num/den class bicluster: def __init__(self,vec,left=None,right=None,distance=0.0,id=None): self.left=left self.right=right self.vec=vec self.id=id self.distance=distance def hcluster(rows,distance=pearson): distances={} currentclustid=-1 # Clusters are initially just the rows clust=[bicluster(rows[i],id=i) for i in range(len(rows))] while len(clust)>1: lowestpair=(0,1) closest=distance(clust[0].vec,clust[1].vec) # loop through every pair looking for the smallest distance for i in range(len(clust)): for j in range(i+1,len(clust)): # distances is the cache of distance calculations if (clust[i].id,clust[j].id) not in distances: distances[(clust[i].id,clust[j].id)]=distance(clust[i].vec,clust[j].vec) #print 'i' #print i #print #print 'j' #print j #print d=distances[(clust[i].id,clust[j].id)] if d<closest: closest=d lowestpair=(i,j) # calculate the average of the two clusters mergevec=[ (clust[lowestpair[0]].vec[i]+clust[lowestpair[1]].vec[i])/2.0 for i in range(len(clust[0].vec))] # create the new cluster newcluster=bicluster(mergevec,left=clust[lowestpair[0]], right=clust[lowestpair[1]], distance=closest,id=currentclustid) # cluster ids that weren't in the original set are negative currentclustid-=1 del clust[lowestpair[1]] del clust[lowestpair[0]] clust.append(newcluster) return clust[0] def kcluster(rows,distance=pearson,k=4): # Determine the minimum and maximum values for each point ranges=[(min([row[i] for row in rows]),max([row[i] for row in rows])) for i in range(len(rows[0]))] # Create k randomly placed centroids clusters=[[random.random( )*(ranges[i][1]-ranges[i][0])+ranges[i][0] for i in range(len(rows[0]))] for j in range(k)] lastmatches=None for t in range(100): print 'Iteration %d' % t bestmatches=[[] for i in range(k)] # Find which centroid is the closest for each row for j in range(len(rows)): row=rows[j] bestmatch=0 for i in range(k): d=distance(clusters[i],row) if d<distance(clusters[bestmatch],row): bestmatch=i bestmatches[bestmatch].append(j) # If the results are the same as last time, this is complete if bestmatches==lastmatches: break lastmatches=bestmatches # Move the centroids to the average of their members for i in range(k): avgs=[0.0]*len(rows[0]) if len(bestmatches[i])>0: for rowid in bestmatches[i]: for m in range(len(rows[rowid])): avgs[m]+=rows[rowid][m] for j in range(len(avgs)): avgs[j]/=len(bestmatches[i]) clusters[i]=avgs return bestmatches

Read the article

Jboss Cache as distributed state repository

- by michael lucas

Is using JBoss Cache as distributed state repository a good idea? Can JBoss Cache be applied in situation when you need the guarantee that each time you read something from repository you get the newest version of it? - and irrespective of which node in a cluster we consider?

Read the article

Choosing a distributed shared memory solution

- by mindas

I have a task to build a prototype for a massively scalable distributed shared memory (DSM) app. The prototype would only serve as a proof-of-concept, but I want to spend my time most effectively by picking the components which would be used in the real solution later on. The aim of this solution is to take data input from an external source, churn it and make the result available for a number of frontends. Those "frontends" would just take the data from the cache and serve it without extra processing. The amount of frontend hits on this data can literally be millions per second. The data itself is very volatile; it can (and does) change quite rapidly. However the frontends should see "old" data until the newest has been processed and cached. The processing and writing is done by a single (redundant) node while other nodes only read the data. In other words: no read-through behaviour. I was looking into solutions like memcached however this particular one doesn't fulfil all our requirements which are listed below: The solution must at least have Java client API which is reasonably well maintained as the rest of app is written in Java and we are seasoned Java developers; The solution must be totally elastic: it should be possible to add new nodes without restarting other nodes in the cluster; The solution must be able to handle failover. Yes, I realize this means some overhead, but the overall served data size isn't big (1G max) so this shouldn't be a problem. By "failover" I mean seamless execution without hardcoding/changing server IP address(es) like in memcached clients when a node goes down; Ideally it should be possible to specify the degree of data overlapping (e.g. how many copies of the same data should be stored in the DSM cluster); There is no need to permanently store all the data but there might be a need of post-processing of some of the data (e.g. serialization to the DB). Price. Obviously we prefer free/open source but we're happy to pay a reasonable amount if a solution is worth it. In any way, paid 24hr/day support contract is a must. The whole thing has to be hosted in our data centers so SaaS offerings like Amazon SimpleDB are out of scope. We would only consider this if no other options would be available. Ideally the solution would be strictly consistent (as in CAP); however, eventual consistence can be considered as an option. Thanks in advance for any ideas.

Read the article

Scalability comparison between different DBMSs

- by Björn Lindfors

By what factor does the performance (read queries/sec) increase when a machine is added to a cluster of machines running either: a Bigtable-like database MySQL? Google's research paper on Bigtable suggests that "near-linear" scaling is achieved can be achieved with Bigtable. This page here featuring MySQL's marketing jargon suggests that MySQL is capable of scaling linearly. Where is the truth?

Read the article

Very basic question about Hadoop and compressed input files

- by Luis Sisamon

I have started to look into Hadoop. If my understanding is right i could process a very big file and it would get split over different nodes, however if the file is compressed then the file could not be split and wold need to be processed by a single node (effectively destroying the advantage of running a mapreduce ver a cluster of parallel machines). My question is, assuming the above is correct, is it possible to split a large file manually in fixed-size chunks, or daily chunks, compress them and then pass a list of compressed input files to perform a mapreduce?

Read the article

Berkeley DB replication: upper limit on number of replicants?

- by Mark Harrison

I'm considering using Berkeley DB to cache some data on an application cluster. What's a reasonable upper limit on the number of nodes I can plan on Berkeley DB handling? Writing to the database will be from a single node.

Read the article

Using Hadoop, are my reducers guaranteed to get all the records with the same key?

- by samg

I'm running a hadoop job (using hive actually) which is supposed to uniq lines in a lot of text file. More specifically it chooses the most recently timestamped record for each key in the reduce step. Does hadoop guarantee that every record with the same key, output by the map step, will go to a single reducer, even if there are many reducers running across a cluster? I'm worried that the mapper output might be split after the shuffle happens, in the middle of a set of records with the same key.

Read the article

Mongrel not using latest deployed code, despite multiple restarts

- by ming yeow

What could be the potential reasons why mongrel does not use the latest code in the ~/current branch? The code changes are in the MODELs. The code changes in the CONTROLLERS TAKE EFFECT. I tried the following: god restart app deploying several times manually stopping mongrel cluster, deleting the pid files, and starting them again Anyone has similar experiences? Where could the server be potentially caching the model files?

Read the article

Tracking Hadoop job status via web interface? (Exposing Hadoop to internal clients in the company)

- by Eran Kampf

I want to develop a website that will allow analysts within the company to run Hadoop jobs (choose from a set of defined jobs) and see their job's status\progress. Is there an easy way to do this (get running jobs statuses etc.) via Ruby\Python? How do you expose your Hadoop cluster to internal clients on your company?

Read the article

Jboss logging issue - pl check this

- by balaji

I’m Working as deployer and server administrator. We use Jboss 4.0x AS to deploy our applications. The issue I'm facing is, Whenever we redeploy/restart the server, server.log is getting created but after sometime the logging goes off. Yes it is not at all updating the server.log file. Due to this, we could not trace the other critical issues we have. Actually we have two separate nodes and we do deploy/restarting the server separately on two nodes. We are facing the issue in both of our test and production environment. I could not trace out where exactly the issue is. Could you please help me in resolving the issue? If we have any other issues, we can check the log files. If log itself is not getting updated/logged, how can we move further in analyzing the issues without the recent/updated logs? Below are the logs found in the stdout.log: 18:55:50,303 INFO [Server] Core system initialized 18:55:52,296 INFO [WebService] Using RMI server codebase: http://kl121tez.is.klmcorp.net:8083/ 18:55:52,313 INFO [Log4jService$URLWatchTimerTask] Configuring from URL: resource:log4j.xml 18:55:52,860 ERROR [STDERR] LOG0026E The Log Manager cannot create the object AmasRBPFTraceLogger without a class name. 18:55:52,860 ERROR [STDERR] LOG0026E The Log Manager cannot create the object AmasRBPFMessageLogger without a class name. 18:55:54,273 ERROR [STDERR] LOG0026E The Log Manager cannot create the object AmasCacheTraceLogger without a class name. 18:55:54,274 ERROR [STDERR] LOG0026E The Log Manager cannot create the object AmasCacheMessageLogger without a class name. 18:55:54,334 ERROR [STDERR] LOG0026E The Log Manager cannot create the object JACCTraceLogger without a class name. 18:55:54,334 ERROR [STDERR] LOG0026E The Log Manager cannot create the object JACCMessageLogger without a class name. 18:55:56,059 INFO [ServiceEndpointManager] WebServices: jbossws-1.0.3.SP1 (date=200609291417) 18:55:56,635 INFO [Embedded] Catalina naming disabled 18:55:56,671 INFO [ClusterRuleSetFactory] Unable to find a cluster rule set in the classpath. Will load the default rule set. 18:55:56,672 INFO [ClusterRuleSetFactory] Unable to find a cluster rule set in the classpath. Will load the default rule set. 18:55:56,843 INFO [Http11BaseProtocol] Initializing Coyote HTTP/1.1 on http-0.0.0.0-8180 18:55:56,844 INFO [Catalina] Initialization processed in 172 ms 18:55:56,844 INFO [StandardService] Starting service jboss.web Please help..

Read the article

Performance bechmarking of MPI program in C

- by dks

I am new to MPI.Can anyone please suggest me how to do benchmarking of MPI programs in C. Cluster I am using is running Rocks 4.3(Mars Hill).

Read the article

How fast are EC/2 nodes between each other?

- by tesmar

Hi, I am looking to setup Amazon EC/2 nodes on rails with Riak. I am looking to be able to sync the riak DBs and if the cluster gets a query, to be able to tell where the data lies and retrieve it quickly. In your opinion(s), is EC/2 fast enough between nodes to query a Riak DB, return the results, and get them back to the client in a timely manner? I am new to all of this, so please be kind :)

Read the article

How to learn using Hadoop

- by ajay

hi, I want to learn hadoop. However, I don't have access to a cluster now. Is it possible for me to learn it and use it for writing programs and learn it properly. Would it be helpful to run multiple linux VMs and then use them as boxes to run hadoop? Or you think that is more of a stretch and running it on a multiple hosts is the same as running in on single host (in terms of setup, Hadoop API used, the architecture of the map-reduce programs etc) Thanks,

Read the article

What alternatives do I have if I want a distributed multi-master database?

- by Jonas

I will build a system where I want to reduce single-point-of-failures, and I need a database. Is there any (free) relational database systems that can handle multi-master setups good (i.e where it is easy to add and remove nodes) or is it better to go with a NoSQL-database? As what I have understood, a key-value store will handle this better. What database system do you recommend for a multi-master (cluster) setup?

Search Results

Search found 1815 results on 73 pages for 'percona xtradb cluster'.

Page 45/73 | < Previous Page | 41 42 43 44 45 46 47 48 49 50 51 52 | Next Page >

- by alex

- by disappearedng

- by reinier

- by vijay.shad

- by Nick Forge

- by balaji

- by user321524

- by User

- by vailen

- by portoalet

- by ebe

- by Jonathan

- by michael lucas

- by mindas

- by Björn Lindfors

- by Luis Sisamon

- by Mark Harrison

- by samg

- by ming yeow

- by Eran Kampf

- by balaji

- by dks

- by tesmar

- by ajay

- by Jonas

< Previous Page | 41 42 43 44 45 46 47 48 49 50 51 52 | Next Page >