Search Results

Search found 4291 results on 172 pages for 'cluster analysis'.

Page 95/172 | < Previous Page | 91 92 93 94 95 96 97 98 99 100 101 102 | Next Page >

Programmatically printing git revision and checking for uncommitted changes

- by Andrew Grimm

To ensure that my scientific analysis is reproducible, I'd like to programmatically check if there are any modifications to the code base that aren't checked in, and if not, print out what commit is being used. For example, if there are uncommitted changes, it should output Warning: uncommitted changes made. This output may not be reproducible. Else, produce Current commit: d27ec73cf2f1df89cbccd41494f579e066bad6fe Ideally, it should use "plumbing", not "porcelain".

Read the article
Google maps: place number in marker?

- by User

How can I display a number in the marker on a google map? I want to do server side clustering and I need to display how many points the cluster represents.

Read the article
help me to my project [closed]

- by latha

hi. plz tel me to how to collect data for my project. as my projct relate to "sustainable urban infrastructure" so help me that what what data should i put to my project.i dont had any idea that how to do vit data analysis .so please direct me .

Read the article
How do I print out objects in an array in python?

- by Jonathan

I'm writing a code which performs a k-means clustering on a set of data. I'm actually using the code from a book called collective intelligence by O'Reilly. Everything works, but in his code he uses the command line and i want to write everything in notepad++. As a reference his line is >>>kclust=clusters.kcluster(data,k=10) >>>[rownames[r] for r in k[0]] Here is my code: from PIL import Image,ImageDraw def readfile(filename): lines=[line for line in file(filename)] # First line is the column titles colnames=lines[0].strip( ).split('\t')[1:] rownames=[] data=[] for line in lines[1:]: p=line.strip( ).split('\t') # First column in each row is the rowname rownames.append(p[0]) # The data for this row is the remainder of the row data.append([float(x) for x in p[1:]]) return rownames,colnames,data from math import sqrt def pearson(v1,v2): # Simple sums sum1=sum(v1) sum2=sum(v2) # Sums of the squares sum1Sq=sum([pow(v,2) for v in v1]) sum2Sq=sum([pow(v,2) for v in v2]) # Sum of the products pSum=sum([v1[i]*v2[i] for i in range(len(v1))]) # Calculate r (Pearson score) num=pSum-(sum1*sum2/len(v1)) den=sqrt((sum1Sq-pow(sum1,2)/len(v1))*(sum2Sq-pow(sum2,2)/len(v1))) if den==0: return 0 return 1.0-num/den class bicluster: def __init__(self,vec,left=None,right=None,distance=0.0,id=None): self.left=left self.right=right self.vec=vec self.id=id self.distance=distance def hcluster(rows,distance=pearson): distances={} currentclustid=-1 # Clusters are initially just the rows clust=[bicluster(rows[i],id=i) for i in range(len(rows))] while len(clust)>1: lowestpair=(0,1) closest=distance(clust[0].vec,clust[1].vec) # loop through every pair looking for the smallest distance for i in range(len(clust)): for j in range(i+1,len(clust)): # distances is the cache of distance calculations if (clust[i].id,clust[j].id) not in distances: distances[(clust[i].id,clust[j].id)]=distance(clust[i].vec,clust[j].vec) #print 'i' #print i #print #print 'j' #print j #print d=distances[(clust[i].id,clust[j].id)] if d<closest: closest=d lowestpair=(i,j) # calculate the average of the two clusters mergevec=[ (clust[lowestpair[0]].vec[i]+clust[lowestpair[1]].vec[i])/2.0 for i in range(len(clust[0].vec))] # create the new cluster newcluster=bicluster(mergevec,left=clust[lowestpair[0]], right=clust[lowestpair[1]], distance=closest,id=currentclustid) # cluster ids that weren't in the original set are negative currentclustid-=1 del clust[lowestpair[1]] del clust[lowestpair[0]] clust.append(newcluster) return clust[0] def kcluster(rows,distance=pearson,k=4): # Determine the minimum and maximum values for each point ranges=[(min([row[i] for row in rows]),max([row[i] for row in rows])) for i in range(len(rows[0]))] # Create k randomly placed centroids clusters=[[random.random( )*(ranges[i][1]-ranges[i][0])+ranges[i][0] for i in range(len(rows[0]))] for j in range(k)] lastmatches=None for t in range(100): print 'Iteration %d' % t bestmatches=[[] for i in range(k)] # Find which centroid is the closest for each row for j in range(len(rows)): row=rows[j] bestmatch=0 for i in range(k): d=distance(clusters[i],row) if d<distance(clusters[bestmatch],row): bestmatch=i bestmatches[bestmatch].append(j) # If the results are the same as last time, this is complete if bestmatches==lastmatches: break lastmatches=bestmatches # Move the centroids to the average of their members for i in range(k): avgs=[0.0]*len(rows[0]) if len(bestmatches[i])>0: for rowid in bestmatches[i]: for m in range(len(rows[rowid])): avgs[m]+=rows[rowid][m] for j in range(len(avgs)): avgs[j]/=len(bestmatches[i]) clusters[i]=avgs return bestmatches

Read the article
How can I convert data encoded in WE8MSWIN1252 to utf8 for use in Python scripts?

- by James Dean

This data comes from an Oracle database and is extracted to flatfiles in encoding 'WE8MSWIN1252'. I want to parse the data and do some analysis. I want to see the text fields but do not need to publish the results to any other system so if some characters do not get converted perfectly I do not have a problem with that. I just do not want my parsing to fail with a decode error which is what I get if I use: inputFile = codecs.open( dataFileName, "r", "utf-8'")

Read the article
Can J2SE applications be clustered

- by ebe

Hi there all, i have numerous Corba servers and some other java apps (not web). I wish to cluster them if possible. Can this be done ? Many thanks

Read the article
Apple Push Notification Service server load?

- by jenningj

I'm preparing to set up a APNS message server, and I was wondering if anybody has done any analysis on APNS server load that they would be able to share. Minimum server specs, maximum messages per second, anything like that. Thanks! edit: I'm planning to implement this with .NET, but info about any platform would be incredibly useful.

Read the article
Explicit disable MySQL query cache in some parts of program

- by jack

In a Django project, some cronjob programs are mainly used for administrative or analysis purposes, e.g. generating site usage stats, rotating user activities log, etc. We probably do not hope MySQL to cache queries in those programs to save memory usage and improve query cache efficiency. Is it possible to turn off MySQL query cache explicitly in those programs while keep it enabled for other parts including all views.py?

Read the article
Production system running JBoss clustering

- by portoalet

Hi, What are some of the largest systems that use JBoss clustering? What are the specs/config? I want to know whether JBoss cluster is really scalable or not. Thanks

Read the article
Running out of memory while analyzing a Java Heap Dump

- by Abel Morelos

Hi, I have a curious problem, I need to analyze a Java heap dump (from an IBM JRE) which has 1.5GB in size, the problem is that while analyzing the dump (I've tried HeapAnalyzer and the IBM Memory Analyzer 0.5) the tools runs out of memory I can't really analyze the dump. I have 3GB of RAM in my machine, but seems like it's not enough to analyze the 1.5 GB dump, My question is, do you know a specific tool for heap dump analysis (supporting IBM JRE dumps) that I could run with the amount of memory I have? Thanks.

Read the article
Jboss Cache as distributed state repository

- by michael lucas

Is using JBoss Cache as distributed state repository a good idea? Can JBoss Cache be applied in situation when you need the guarantee that each time you read something from repository you get the newest version of it? - and irrespective of which node in a cluster we consider?

Read the article
Possible to download entire whois database / list of registered domains?

- by Parand

I wanted to do some analysis on registered domain names. Looks like I can hit whois.internic.net to get information about each domain, but it also looks like there are rate limits that prevent me from doing large numbers of queries. Is there a way to periodically (say daily) grab the entire whois database? I really only care about whether a domain is registered or not, so I don't need the full whois information.

Read the article
How can I debug a Windows service that crashes?

- by Christopher

I have a .NET Windows service that appears to be crashing due to C00000005 (access violation--according to Dr Watson). When I attach the VS debugger to it--whether I build it with or without symbols--the VS debugger just stops when the service crashes, instead of stopping to give me a chance to do any investigation. Is that to be expected, or am I doing something wrong? Will using WinDbg let me do something more in real time (obviously, WinDbg lets me do crash dump analysis)? Thanks!

Read the article
Is it possible to profile memory usage of unit tests?

- by Rowland Shaw

I'm looking at building some unit tests to ascertain if resources are leaking (or not) using the unit testing framework that comes with Visual Studio. At present, I'm evaluating the latest version of ANTS Profiler, but I can't quite work out if it allows me to force a snapshot from code (so that I can take a snapshot, run a unit test a few hundred times, force a garbage collection, and take another snapshot, and save the results out for later analysis). Is this possible to do with ANTS/Visual Studio or should I be exploring options with other profilers?

Read the article
What native C++ profiling tool do you suggest?

- by glutz78

Can anyone suggest a performance analysis tool that runs on win32 on a native c++ app? How about one that runs on Windows Mobile? (Preferably one that does not cost too much) Thank you.

Read the article
Very basic question about Hadoop and compressed input files

- by Luis Sisamon

I have started to look into Hadoop. If my understanding is right i could process a very big file and it would get split over different nodes, however if the file is compressed then the file could not be split and wold need to be processed by a single node (effectively destroying the advantage of running a mapreduce ver a cluster of parallel machines). My question is, assuming the above is correct, is it possible to split a large file manually in fixed-size chunks, or daily chunks, compress them and then pass a list of compressed input files to perform a mapreduce?

Read the article
Mongrel not using latest deployed code, despite multiple restarts

- by ming yeow

What could be the potential reasons why mongrel does not use the latest code in the ~/current branch? The code changes are in the MODELs. The code changes in the CONTROLLERS TAKE EFFECT. I tried the following: god restart app deploying several times manually stopping mongrel cluster, deleting the pid files, and starting them again Anyone has similar experiences? Where could the server be potentially caching the model files?

Read the article
Enthought Python, Sage, or others (in Unix clusters)

- by vailen

I am currently get access to a cluster of Unix machines, but they don't have the software I need (numpy, scipy, matplotlib, etc), and I have to install them by myself (I don't have the root permission, either, so commands like apt-get or yast doesn't work). In the worst case, I have to compile them all from source. Is there any better way to do so? I hear something about Enthought Python and Sage, but not sure what is the best way to do so. Any suggestion?

Read the article
Autofac or Ninject? - which should I go for?

- by Greg

I'm hitting paralysis by analysis I think... Which should I go for for my first IOC container: Autofac or Ninject? (Just want an open source, nice and simple, IOC container)

Read the article
Exception when indexing text documents with Lucene, using SnowballAnalyzer for cleaning up

- by Julia

Hello!!! I am indexing the documents with Lucene and am trying to apply the SnowballAnalyzer for punctuation and stopword removal from text .. I keep getting the following error :( IllegalAccessError: tried to access method org.apache.lucene.analysis.Tokenizer.(Ljava/io/Reader;)V from class org.apache.lucene.analysis.snowball.SnowballAnalyzer Here is the code, I would very much appreciate help!!!! I am new with this.. public class Indexer { private Indexer(){}; private String[] stopWords = {....}; private String indexName; private IndexWriter iWriter; private static String FILES_TO_INDEX = "/Users/ssi/forindexing"; public static void main(String[] args) throws Exception { Indexer m = new Indexer(); m.index("./newindex"); } public void index(String indexName) throws Exception { this.indexName = indexName; final File docDir = new File(FILES_TO_INDEX); if(!docDir.exists() || !docDir.canRead()){ System.err.println("Something wrong... " + docDir.getPath()); System.exit(1); } Date start = new Date(); PerFieldAnalyzerWrapper analyzers = new PerFieldAnalyzerWrapper(new SimpleAnalyzer()); analyzers.addAnalyzer("text", new SnowballAnalyzer("English", stopWords)); Directory directory = FSDirectory.open(new File(this.indexName)); IndexWriter.MaxFieldLength maxLength = IndexWriter.MaxFieldLength.UNLIMITED; iWriter = new IndexWriter(directory, analyzers, true, maxLength); System.out.println("Indexing to dir..........." + indexName); if(docDir.isDirectory()){ File[] files = docDir.listFiles(); if(files != null){ for (int i = 0; i < files.length; i++) { try { indexDocument(files[i]); }catch (FileNotFoundException fnfe){ fnfe.printStackTrace(); } } } } System.out.println("Optimizing...... "); iWriter.optimize(); iWriter.close(); Date end = new Date(); System.out.println("Time to index was" + (end.getTime()-start.getTime()) + "miliseconds"); } private void indexDocument(File someDoc) throws IOException { Document doc = new Document(); Field name = new Field("name", someDoc.getName(), Field.Store.YES, Field.Index.ANALYZED); Field text = new Field("text", new FileReader(someDoc), Field.TermVector.WITH_POSITIONS_OFFSETS); doc.add(name); doc.add(text); iWriter.addDocument(doc); } }

Read the article
A strategy to troubleshoot/ fix application crashes in Windows?

- by Manav Sharma

All, Over a period of time I have observed that fixing issues related to application crash is a discipline in itself. Some people have this nice way of attacking such problems. Ranging from Viewing the 'Event Viewer' to running Static/ Dynamic memory analysis tools to some of their 'personal favorites', these people have developed this art. Can we share articles/ links/ personal approaches that we use to understand/ troubleshoot/ fix such issues? Thanks

Read the article
Choosing a distributed shared memory solution

- by mindas

I have a task to build a prototype for a massively scalable distributed shared memory (DSM) app. The prototype would only serve as a proof-of-concept, but I want to spend my time most effectively by picking the components which would be used in the real solution later on. The aim of this solution is to take data input from an external source, churn it and make the result available for a number of frontends. Those "frontends" would just take the data from the cache and serve it without extra processing. The amount of frontend hits on this data can literally be millions per second. The data itself is very volatile; it can (and does) change quite rapidly. However the frontends should see "old" data until the newest has been processed and cached. The processing and writing is done by a single (redundant) node while other nodes only read the data. In other words: no read-through behaviour. I was looking into solutions like memcached however this particular one doesn't fulfil all our requirements which are listed below: The solution must at least have Java client API which is reasonably well maintained as the rest of app is written in Java and we are seasoned Java developers; The solution must be totally elastic: it should be possible to add new nodes without restarting other nodes in the cluster; The solution must be able to handle failover. Yes, I realize this means some overhead, but the overall served data size isn't big (1G max) so this shouldn't be a problem. By "failover" I mean seamless execution without hardcoding/changing server IP address(es) like in memcached clients when a node goes down; Ideally it should be possible to specify the degree of data overlapping (e.g. how many copies of the same data should be stored in the DSM cluster); There is no need to permanently store all the data but there might be a need of post-processing of some of the data (e.g. serialization to the DB). Price. Obviously we prefer free/open source but we're happy to pay a reasonable amount if a solution is worth it. In any way, paid 24hr/day support contract is a must. The whole thing has to be hosted in our data centers so SaaS offerings like Amazon SimpleDB are out of scope. We would only consider this if no other options would be available. Ideally the solution would be strictly consistent (as in CAP); however, eventual consistence can be considered as an option. Thanks in advance for any ideas.

Read the article
Does ASP.net Report Viewer / Reports require Reporting Services on SQL server

- by soldieraman

I have an application that makes use of Report Viewer and Report (.rdlc) files. Does this mean that I need to have "Reporting Services" installed on my SQL server?? Also would not having "SQL Server Analysis services" affect me any way I want to make sure I keep using - SQL Server Profiler - SQL Server Agent - create and run management tasks - Reporting services if the first question's answer is true.

Read the article
can i do multiple things in one command on linux?

- by Jason94

Im testing something where im compiling some code and analysing output with a perl script. So first i run make, manually copy&paste the output to errors.txt and then running my perl script (running: perl analysis.pl) in terminal. Is there away I can do this just with one line in bash?

Read the article
Meaning of parameters in a Google query?

- by blinry

Are there any ressources on what the parameters in a Google query mean? Any analysis how the Google search pages work internally? Examples would be http://www.google.com/#hl=en&source=hp&q=lol&aq=f&aqi=&aql=&oq=&fp=45675624562456 or http://www.google.com/url?sa=t&source=web&ct=res&cd=11&ved=KJSGHFKSDJF&url=sfdgagasdgasdgasgasg&rct=j&q=fghthwrteghedgf&ei=asdfasdfsa&usg=asdfasdfasf

Read the article

< Previous Page | 91 92 93 94 95 96 97 98 99 100 101 102 | Next Page >