Search Results

Search found 1781 results on 72 pages for 'cluster'.

Page 44/72 | < Previous Page | 40 41 42 43 44 45 46 47 48 49 50 51  | Next Page >

  • Enthought Python, Sage, or others (in Unix clusters)

    - by vailen
    I am currently get access to a cluster of Unix machines, but they don't have the software I need (numpy, scipy, matplotlib, etc), and I have to install them by myself (I don't have the root permission, either, so commands like apt-get or yast doesn't work). In the worst case, I have to compile them all from source. Is there any better way to do so? I hear something about Enthought Python and Sage, but not sure what is the best way to do so. Any suggestion?

    Read the article

  • How do I print out objects in an array in python?

    - by Jonathan
    I'm writing a code which performs a k-means clustering on a set of data. I'm actually using the code from a book called collective intelligence by O'Reilly. Everything works, but in his code he uses the command line and i want to write everything in notepad++. As a reference his line is >>>kclust=clusters.kcluster(data,k=10) >>>[rownames[r] for r in k[0]] Here is my code: from PIL import Image,ImageDraw def readfile(filename): lines=[line for line in file(filename)] # First line is the column titles colnames=lines[0].strip( ).split('\t')[1:] rownames=[] data=[] for line in lines[1:]: p=line.strip( ).split('\t') # First column in each row is the rowname rownames.append(p[0]) # The data for this row is the remainder of the row data.append([float(x) for x in p[1:]]) return rownames,colnames,data from math import sqrt def pearson(v1,v2): # Simple sums sum1=sum(v1) sum2=sum(v2) # Sums of the squares sum1Sq=sum([pow(v,2) for v in v1]) sum2Sq=sum([pow(v,2) for v in v2]) # Sum of the products pSum=sum([v1[i]*v2[i] for i in range(len(v1))]) # Calculate r (Pearson score) num=pSum-(sum1*sum2/len(v1)) den=sqrt((sum1Sq-pow(sum1,2)/len(v1))*(sum2Sq-pow(sum2,2)/len(v1))) if den==0: return 0 return 1.0-num/den class bicluster: def __init__(self,vec,left=None,right=None,distance=0.0,id=None): self.left=left self.right=right self.vec=vec self.id=id self.distance=distance def hcluster(rows,distance=pearson): distances={} currentclustid=-1 # Clusters are initially just the rows clust=[bicluster(rows[i],id=i) for i in range(len(rows))] while len(clust)>1: lowestpair=(0,1) closest=distance(clust[0].vec,clust[1].vec) # loop through every pair looking for the smallest distance for i in range(len(clust)): for j in range(i+1,len(clust)): # distances is the cache of distance calculations if (clust[i].id,clust[j].id) not in distances: distances[(clust[i].id,clust[j].id)]=distance(clust[i].vec,clust[j].vec) #print 'i' #print i #print #print 'j' #print j #print d=distances[(clust[i].id,clust[j].id)] if d<closest: closest=d lowestpair=(i,j) # calculate the average of the two clusters mergevec=[ (clust[lowestpair[0]].vec[i]+clust[lowestpair[1]].vec[i])/2.0 for i in range(len(clust[0].vec))] # create the new cluster newcluster=bicluster(mergevec,left=clust[lowestpair[0]], right=clust[lowestpair[1]], distance=closest,id=currentclustid) # cluster ids that weren't in the original set are negative currentclustid-=1 del clust[lowestpair[1]] del clust[lowestpair[0]] clust.append(newcluster) return clust[0] def kcluster(rows,distance=pearson,k=4): # Determine the minimum and maximum values for each point ranges=[(min([row[i] for row in rows]),max([row[i] for row in rows])) for i in range(len(rows[0]))] # Create k randomly placed centroids clusters=[[random.random( )*(ranges[i][1]-ranges[i][0])+ranges[i][0] for i in range(len(rows[0]))] for j in range(k)] lastmatches=None for t in range(100): print 'Iteration %d' % t bestmatches=[[] for i in range(k)] # Find which centroid is the closest for each row for j in range(len(rows)): row=rows[j] bestmatch=0 for i in range(k): d=distance(clusters[i],row) if d<distance(clusters[bestmatch],row): bestmatch=i bestmatches[bestmatch].append(j) # If the results are the same as last time, this is complete if bestmatches==lastmatches: break lastmatches=bestmatches # Move the centroids to the average of their members for i in range(k): avgs=[0.0]*len(rows[0]) if len(bestmatches[i])>0: for rowid in bestmatches[i]: for m in range(len(rows[rowid])): avgs[m]+=rows[rowid][m] for j in range(len(avgs)): avgs[j]/=len(bestmatches[i]) clusters[i]=avgs return bestmatches

    Read the article

  • Choosing a distributed shared memory solution

    - by mindas
    I have a task to build a prototype for a massively scalable distributed shared memory (DSM) app. The prototype would only serve as a proof-of-concept, but I want to spend my time most effectively by picking the components which would be used in the real solution later on. The aim of this solution is to take data input from an external source, churn it and make the result available for a number of frontends. Those "frontends" would just take the data from the cache and serve it without extra processing. The amount of frontend hits on this data can literally be millions per second. The data itself is very volatile; it can (and does) change quite rapidly. However the frontends should see "old" data until the newest has been processed and cached. The processing and writing is done by a single (redundant) node while other nodes only read the data. In other words: no read-through behaviour. I was looking into solutions like memcached however this particular one doesn't fulfil all our requirements which are listed below: The solution must at least have Java client API which is reasonably well maintained as the rest of app is written in Java and we are seasoned Java developers; The solution must be totally elastic: it should be possible to add new nodes without restarting other nodes in the cluster; The solution must be able to handle failover. Yes, I realize this means some overhead, but the overall served data size isn't big (1G max) so this shouldn't be a problem. By "failover" I mean seamless execution without hardcoding/changing server IP address(es) like in memcached clients when a node goes down; Ideally it should be possible to specify the degree of data overlapping (e.g. how many copies of the same data should be stored in the DSM cluster); There is no need to permanently store all the data but there might be a need of post-processing of some of the data (e.g. serialization to the DB). Price. Obviously we prefer free/open source but we're happy to pay a reasonable amount if a solution is worth it. In any way, paid 24hr/day support contract is a must. The whole thing has to be hosted in our data centers so SaaS offerings like Amazon SimpleDB are out of scope. We would only consider this if no other options would be available. Ideally the solution would be strictly consistent (as in CAP); however, eventual consistence can be considered as an option. Thanks in advance for any ideas.

    Read the article

  • Scalability comparison between different DBMSs

    - by Björn Lindfors
    By what factor does the performance (read queries/sec) increase when a machine is added to a cluster of machines running either: a Bigtable-like database MySQL? Google's research paper on Bigtable suggests that "near-linear" scaling is achieved can be achieved with Bigtable. This page here featuring MySQL's marketing jargon suggests that MySQL is capable of scaling linearly. Where is the truth?

    Read the article

  • Very basic question about Hadoop and compressed input files

    - by Luis Sisamon
    I have started to look into Hadoop. If my understanding is right i could process a very big file and it would get split over different nodes, however if the file is compressed then the file could not be split and wold need to be processed by a single node (effectively destroying the advantage of running a mapreduce ver a cluster of parallel machines). My question is, assuming the above is correct, is it possible to split a large file manually in fixed-size chunks, or daily chunks, compress them and then pass a list of compressed input files to perform a mapreduce?

    Read the article

  • Using Hadoop, are my reducers guaranteed to get all the records with the same key?

    - by samg
    I'm running a hadoop job (using hive actually) which is supposed to uniq lines in a lot of text file. More specifically it chooses the most recently timestamped record for each key in the reduce step. Does hadoop guarantee that every record with the same key, output by the map step, will go to a single reducer, even if there are many reducers running across a cluster? I'm worried that the mapper output might be split after the shuffle happens, in the middle of a set of records with the same key.

    Read the article

  • Mongrel not using latest deployed code, despite multiple restarts

    - by ming yeow
    What could be the potential reasons why mongrel does not use the latest code in the ~/current branch? The code changes are in the MODELs. The code changes in the CONTROLLERS TAKE EFFECT. I tried the following: god restart app deploying several times manually stopping mongrel cluster, deleting the pid files, and starting them again Anyone has similar experiences? Where could the server be potentially caching the model files?

    Read the article

  • Jboss logging issue - pl check this

    - by balaji
    I’m Working as deployer and server administrator. We use Jboss 4.0x AS to deploy our applications. The issue I'm facing is, Whenever we redeploy/restart the server, server.log is getting created but after sometime the logging goes off. Yes it is not at all updating the server.log file. Due to this, we could not trace the other critical issues we have. Actually we have two separate nodes and we do deploy/restarting the server separately on two nodes. We are facing the issue in both of our test and production environment. I could not trace out where exactly the issue is. Could you please help me in resolving the issue? If we have any other issues, we can check the log files. If log itself is not getting updated/logged, how can we move further in analyzing the issues without the recent/updated logs? Below are the logs found in the stdout.log: 18:55:50,303 INFO [Server] Core system initialized 18:55:52,296 INFO [WebService] Using RMI server codebase: http://kl121tez.is.klmcorp.net:8083/ 18:55:52,313 INFO [Log4jService$URLWatchTimerTask] Configuring from URL: resource:log4j.xml 18:55:52,860 ERROR [STDERR] LOG0026E The Log Manager cannot create the object AmasRBPFTraceLogger without a class name. 18:55:52,860 ERROR [STDERR] LOG0026E The Log Manager cannot create the object AmasRBPFMessageLogger without a class name. 18:55:54,273 ERROR [STDERR] LOG0026E The Log Manager cannot create the object AmasCacheTraceLogger without a class name. 18:55:54,274 ERROR [STDERR] LOG0026E The Log Manager cannot create the object AmasCacheMessageLogger without a class name. 18:55:54,334 ERROR [STDERR] LOG0026E The Log Manager cannot create the object JACCTraceLogger without a class name. 18:55:54,334 ERROR [STDERR] LOG0026E The Log Manager cannot create the object JACCMessageLogger without a class name. 18:55:56,059 INFO [ServiceEndpointManager] WebServices: jbossws-1.0.3.SP1 (date=200609291417) 18:55:56,635 INFO [Embedded] Catalina naming disabled 18:55:56,671 INFO [ClusterRuleSetFactory] Unable to find a cluster rule set in the classpath. Will load the default rule set. 18:55:56,672 INFO [ClusterRuleSetFactory] Unable to find a cluster rule set in the classpath. Will load the default rule set. 18:55:56,843 INFO [Http11BaseProtocol] Initializing Coyote HTTP/1.1 on http-0.0.0.0-8180 18:55:56,844 INFO [Catalina] Initialization processed in 172 ms 18:55:56,844 INFO [StandardService] Starting service jboss.web Please help..

    Read the article

  • How fast are EC/2 nodes between each other?

    - by tesmar
    Hi, I am looking to setup Amazon EC/2 nodes on rails with Riak. I am looking to be able to sync the riak DBs and if the cluster gets a query, to be able to tell where the data lies and retrieve it quickly. In your opinion(s), is EC/2 fast enough between nodes to query a Riak DB, return the results, and get them back to the client in a timely manner? I am new to all of this, so please be kind :)

    Read the article

  • How to learn using Hadoop

    - by ajay
    hi, I want to learn hadoop. However, I don't have access to a cluster now. Is it possible for me to learn it and use it for writing programs and learn it properly. Would it be helpful to run multiple linux VMs and then use them as boxes to run hadoop? Or you think that is more of a stretch and running it on a multiple hosts is the same as running in on single host (in terms of setup, Hadoop API used, the architecture of the map-reduce programs etc) Thanks,

    Read the article

  • What alternatives do I have if I want a distributed multi-master database?

    - by Jonas
    I will build a system where I want to reduce single-point-of-failures, and I need a database. Is there any (free) relational database systems that can handle multi-master setups good (i.e where it is easy to add and remove nodes) or is it better to go with a NoSQL-database? As what I have understood, a key-value store will handle this better. What database system do you recommend for a multi-master (cluster) setup?

    Read the article

  • System call time out?

    - by Arnold
    Hi, I'm using unix system() calls to gunzip and gzip files. With very large files sometimes (i.e. on the cluster compute node) these get aborted, while other times (i.e. on the login nodes) they go through. Is there some soft limit on the time a system call may take? What else could it be?

    Read the article

  • Table clusters in SQLServer

    - by Bruno Martinez
    In Oracle, a table cluster is a group of tables that share common columns and store related data in the same blocks. When tables are clustered, a single data block can contain rows from multiple tables. For example, a block can store rows from both the employees and departments tables rather than from only a single table: http://download.oracle.com/docs/cd/E11882_01/server.112/e10713/tablecls.htm#i25478 Can this be done in SQLServer?

    Read the article

  • Multicore programming: what's necessary to do it?

    - by Casey
    I have a quadcore processor and I would really like to take advantage of all those cores when I'm running quick simulations. The problem is I'm only familiar with the small Linux cluster we have in the lab and I'm using Vista at home. What sort of things do I want to look into for multicore programming with C or Java? What is the lingo that I want to google? Thanks for the help.

    Read the article

  • using qsub (sge) with multi-threaded applications

    - by dan12345
    i wanted to submit a multi-threaded job to the cluster network i'm working with - but the man page about qsub is not clear how this is done - By default i guess it just sends it as a normal job regardless of the multi-threading - but this might cause problems, i.e. sending many multi-threaded jobs to the same computer, slowing things down. Does anyone know how to accomplish this? thanks. The batch server system is sge.

    Read the article

  • Running OpenMPI on Windows XP

    - by iamweird
    Hi there. I'm trying to build a simple cluster based on Windows XP. I compiled OpenMPI-1.4.2 successfully, and tools like mpicc and ompi_info work too, but I can't get my mpirun working properly. The only output I can see is Z:\orterun --hostfile z:\hosts.txt -np 2 hostname [host0:04728] Failed to initialize COM library. Error code = -2147417850 [host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\openmpi-1.4.2 \orte\mca\ess\hnp\ess_hnp_module.c at line 218 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_plm_init failed -- Returned value Error (-1) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\openmpi-1.4.2 \orte\runtime\orte_init.c at line 132 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_set_name failed -- Returned value Error (-1) instead of ORTE_SUCCESS -------------------------------------------------------------------------- [host0:04728] [[8946,0],0] ORTE_ERROR_LOG: Error in file ..\..\..\..\openmpi -1.4.2\orte\tools\orterun\orterun.c at line 543 Where z:\hosts.txt appears as follows: host0 host1 Z: is a shared network drive available to both host0 and host1. What my problem is and how do I fix it? Upd: Ok, this problem seems to be fixed. It seems to me that WideCap driver and/or software components causes this error to appear. A "clean" machine runs local task successfully. Anyway, I still cannot run a task within at least 2 machines, I'm getting following message: Z:\mpirun --hostfile z:\hosts.txt -np 2 hostname connecting to host1 username:cluster password:******** Save Credential?(Y/N) y [host0:04728] This feature hasn't been implemented yet. [host0:04728] Could not connect to namespace cimv2 on node host1. Error code =-2147024891 -------------------------------------------------------------------------- mpirun was unable to start the specified application as it encountered an error. More information may be available above. -------------------------------------------------------------------------- I googled a little and did all the things as described here: http://www.open-mpi.org/community/lists/users/2010/03/12355.php but I'm still getting the same error. Can anyone help me? Upd2: Error code -2147024891 might be WMI error WBEM_E_INVALID_PARAMETER (0x80041008) which occures when one of the parameters passed to the WMI call is not correct. Does this mean that the problem is in OpenMPI source code itself? Or maybe it's because of wrong/outdated wincred.h and credui.lib I used while building OpenMPI from the source code?

    Read the article

  • Windows Azure: Parallelization of the code

    - by veda
    I have some matrix multiplication operation. I want to parallelize the execution of those operations through multiple processors.. This can be done on high performance computing cluster using MPI (Message Passing Interface). Like wise, can I do some parallelization in the cloud using multiple worker roles. Is there any means for doing that.

    Read the article

  • Ushahidi - How to make the markers stay on the map on zoom change?

    - by Guttemberg
    I am using the platform Ushahidi Web-2.7.3 , see: http://ti5.net.br/provedorlegal.com.br, and when I zoom in beyond a certain level, the clustered markers disappear from the map. I also tested this on an older version of a site, see: http://movimentofichalimpa.org/mapa, where the clustered markers do not disappear on zooming in, but just become ungrouped, as is normal with a cluster strategy. How can I make the markers remain on the map when zooming in?

    Read the article

< Previous Page | 40 41 42 43 44 45 46 47 48 49 50 51  | Next Page >