Search Results

Search found 4291 results on 172 pages for 'cluster analysis'.

Page 124/172 | < Previous Page | 120 121 122 123 124 125 126 127 128 129 130 131 | Next Page >

Identify cause of hundreds of AJP threads in Tomcat

- by Rich

We have two Tomcat 6.0.20 servers fronted by Apache, with communication between the two using AJP. Tomcat in turn consumes web services on a JBoss cluster. This morning, one of the Tomcat machines was using 100% of CPU on 6 of the 8 cores on our machine. We took a heap dump using JConsole, and then tried to connect JVisualVM to get a profile to see what was taking all the CPU, but this caused Tomcat to crash. At least we had the heap dump! I have loaded the heap dump into Eclipse MAT, where I have found that we have 565 instances of java.lang.Thread. Some of these, obviously, are entirely legitimate, but the vast majority are named "ajp-6009-XXX" where XXX is a number. I know my way around Eclipse MAT pretty well, but haven't been able to find an explanation for it. If anyone has some pointers as to why Tomcat may be doing this, or some hints on finding out why using Eclipse MAT, that'd be appreciated!

Read the article
What is the difference between Multiple R-squared and Adjusted R-squared in a single-variate least s

- by fmark

Could someone explain to the statistically naive what the difference between Multiple R-squared and Adjusted R-squared is? I am doing a single-variate regression analysis as follows: v.lm <- lm(epm ~ n_days, data=v) print(summary(v.lm)) Results: Call: lm(formula = epm ~ n_days, data = v) Residuals: Min 1Q Median 3Q Max -693.59 -325.79 53.34 302.46 964.95 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2550.39 92.15 27.677 <2e-16 *** n_days -13.12 5.39 -2.433 0.0216 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 410.1 on 28 degrees of freedom Multiple R-squared: 0.1746, Adjusted R-squared: 0.1451 F-statistic: 5.921 on 1 and 28 DF, p-value: 0.0216 Apologies for the newbiness of this question.

Read the article
How to find a coding buddy

- by Lirik

I was reading Jeff Atwood's blog and he mentioned that he was suffering from code-paralysis (he called it analysis paralysis, but I feel like it's also code paralysis) when he didn't have a code buddy: http://www.codinghorror.com/blog/ Unfortunately I think that Jeff has set the bar a bit high, because he only works with developers who are really amazing. The only reason really amazing developers would work with me is if I was really amazing too, but sometimes I don't feel that amazing... the only thing I feel is that if I had a coding buddy I could be amazing :). I'm working on a project and I don't have many friends that are programmers, let alone friends that have time to spend on extracurricular activities. Jeff seems to have been able to find like-minded people that are actually willing to work together. I feel like I'm in a code-paralysis stage too and I need some coding buddies, where would I find some? How would I approach them?

Read the article
[java] Efficiency of while(true) ServerSocket Listen

- by Submerged

I am wondering if a typical while(true) ServerSocket listen loop takes an entire core to wait and accept a client connection (Even when implementing runnable and using Thread .start()) I am implementing a type of distributed computing cluster and each computer needs every core it has for computation. A Master node needs to communicate with these computers (invoking static methods that modify the algorithm's functioning). The reason I need to use sockets is due to the cross platform / cross language capabilities. In some cases, PHP will be invoking these java static methods. I used a java profiler (YourKit) and I can see my running ServerSocket listen thread and it never sleeps and it's always running. Is there a better approach to do what I want? Or, will the performance hit be negligible? Please, feel free to offer any suggestion if you can think of a better way (I've tried RMI, but it isn't supported cross-language. Thanks everyone

Read the article
C# how to correctly dispose of an SmtpClient?

- by JL

VS 2010 code analysis reports the following: Warning 4 CA2000 : Microsoft.Reliability : In method 'Mailer.SendMessage()', object 'client' is not disposed along all exception paths. Call System.IDisposable.Dispose on object 'client' before all references to it are out of scope. My code is : public void SendMessage() { SmtpClient client = new SmtpClient(); client.Send(Message); client.Dispose(); DisposeAttachments(); } How should I correctly dispose of client? Update: to answer Jons question, here is the dispose attachments functionality: private void DisposeAttachments() { foreach (Attachment attachment in Message.Attachments) { attachment.Dispose(); } Message.Attachments.Dispose(); Message = null; }

Read the article
Graphical Programming Language

- by prosseek

In control engineering or instrumentation, I see Simulink or LabVIEW(G) is pretty popular. In ESL design, I see that Agilent SystemVue is gaining some popularity. If you see the well established compiler theroy, almost 100% is about the textual language. But how about the graphical language? Is there any noticable research or discussion about the graphical programming language? In terms of Theory about Graphical Language - syntactic/semantic analysis and whatever relevant expressiveness (Actually, I asked a question about it at SO - http://stackoverflow.com/questions/2427496/what-do-you-mean-by-the-expressiveness-in-programming-lanuguage) Possibility of the Graphical language ... Or what do you think about the Graphical Programming Language?

Read the article
Downloading Eclipse's Source Code

- by digiarnie

I'm doing a study on large Java projects and would like to view the source code for Eclipse. I have gone to this url (http://wiki.eclipse.org/index.php/CVS_Howto) and figured that the most useful cvs repository for me to look at would be this one: :pserver:[email protected]:/cvsroot/eclipse (The Eclipse platform project) However, when looking at this repository, it has so many modules! Which modules should I be trying to check out? I don't necessarily want to build the IDE from source, however, I just want to get the core Eclipse code base to perform some analysis. Would I just check out any modules starting with "org.eclipse..."? Should I be checking out any of the others? Or is there an easier way to get the source? I read somewhere that you can get the source from the binary version of Eclipse but I am unsure where to find the source.

Read the article
Creating a "less"-like console pager interface for pysqlite3 database

- by Eric

I would like to add some interactive capability to a python CLI application I've writen that stores data in a SQLite3 database. Currently, my app reads-in a certain type of file, parses and analyzes, puts the analysis data into the db, and spits the formatted records to stdout (which I generally pipe to a file). There are on-the-order-of a million records in this file. Ideally, I would like to eliminate that text file situation altogether and just loop after that "parse and analyze" part, displaying a screen's worth of records, and allowing the user to page through them and enter some commands that will edit the records. The backend part I know how to do. Can anyone suggest a good starting point for creating that pager frontend either directly in the console (like the pager "less"), through ncurses, or some other system?

Read the article
Are there macro facility for Java or C#?

- by h2g2java

Macros are useful. Therefore, I occasionally bemoan the absence of macros in Java and C#. Macros allow me to force in-line but allow me the code-manageability of non-macro code. Is there any Java- or C#-based project/product somewhere out there that effectively allow macros or specifying in-line expansion. I am thinking of something like @macro public void hello(int x){ ... } or when I call a method, an @inline annotation preceding the call would effect the called-method to be in-lined. or, should I need to know that I should just trust the compiler to make the best the decision for me that at the best of its analysis it might in-line a call. I hope this question will not lead to debating the pro/cons/usefulness of macros.

Read the article
How can I loop through variables in SPSS? I want to avoid code duplication.

- by chucknelson

Is there a "native" SPSS way to loop through some variable names? All I want to do is take a list of variables (that I define) and run the same procedure for them: pseudo-code - not really a good example, but gets the point across... for i in varlist['a','b','c'] do FREQUENCIES VARIABLES=varlist[i] / ORDER=ANALYSIS. end I've noticed that people seem to just use R or Python SPSS plugins to achieve this basic array functionality, but I don't know how soon I can get those configured (if ever) on my installation of SPSS. SPSS has to have some native way to do this...right?

Read the article
Tool for parsing smtp logs that finds bounces

- by Željko Filipin

Our web application sends e-mails. We have lots of users, and we get lots of bounces. For example, user changes company and his company e-mail is no longer valid. To find bounces, I parse smtp log file with log parser. Some bounces are great, like 550+#[email protected]. There is [email protected] in bounce. But some do not have e-mail in error message, like 550+No+such+recipient. I have created simple ruby script that parses logs (uses log parser) to find which mail caused something like 550+No+such+recipient. I am just surprised that I could not find a tool that does it. I have found tools like zabbix and splunk for log analysis, but they look like overkill for such simple task. Anybody knows a tool that would parse smtp logs, find bounces and e-mails that cause them? Edit: smtp server is microsoft smtp server.

Read the article
what is order notation f(n)=O(g(n))?

- by Lopa

2 questions: question 1: under what circumstances would this[O(f(n))=O(k.f(n))] be the most appropriate form of time-complexity analysis? question 2: working from mathematical definition of O notation, show that O(f(n))=O(k.f(n)), for positive constant k.? My view: For the first one I think it is average case and worst case form of time-complexity. am i right? and what else do i write in that? for the second one I think we need to define the function mathematically, so is the answer something like because the multiplication by a constant just corresponds to a readjustment of value of the arbitrary constant 'k' in definition of O.

Read the article
NLB and Host Header Value

- by Hafeez

Background: We are using MOSS 2007 in farm configuration, 2 WFE, 1 Indexer and SQL Server. MS NLB is used for load balancing. Host header value mapped to Virtual IP of Cluster in DNS, is used while creating the web applications in MOSS and all are sharing port 80. Problem: When client tries to access the web application that are configured with host header values. Both of WFEs Hangs for 5 minutes, they stop responding to ping and browser shows 'Page not found'. In the Application Log on the WFE, this error is registered "provider: TCP Provider, error: 0 - The semaphore timeout period has expired". Interestingly, the web application with no host header value and hosted on different ports is working correctly. Any clue to solve this problem will be helpful. Thks. Hafeez

Read the article
Out of memory error while using clusterdata in MATLAB

- by Hossein

Hi, I am trying to cluster a Matrix (size: 20057x2).: T = clusterdata(X,cutoff); but I get this error: ??? Error using == pdistmex Out of memory. Type HELP MEMORY for your options. Error in == pdist at 211 Y = pdistmex(X',dist,additionalArg); Error in == linkage at 139 Z = linkagemex(Y,method,pdistArg); Error in == clusterdata at 88 Z = linkage(X,linkageargs{1},pdistargs); Error in == kmeansTest at 2 T = clusterdata(X,1); can someone help me. I have 4GB of ram, but think that the problem is from somewhere else..

Read the article
All possible values of int from the smallest to the largest, using Java.

- by Totophil

Write a program to print out all possible values of int data type from the smallest to the largest, using Java. Some notable solutions as of 8th of May 2009, 10:44 GMT: 1) Daniel Lew was the first to post correctly working code. 2) Kris has provided the simplest solution for the given problem. 3) Tom Hawtin - tackline, came up arguably with the most elegant solution. 4) mmyers pointed out that printing is likely to become a bottleneck and can be improved through buffering. 5) Jay's brute force approach is notable since, besides defying the core point of programming, the resulting source code takes about 128 GB and will blow compiler limits. As a side note I believe that the answers do demonstrate that it could be a good interview question, as long as the emphasis is not on the ability to remember trivia about the data type overflow and its implications (that can be easily spotted during unit testing), or the way of obtaining MAX and MIN limits (can easily be looked up in the documentation) but rather on the analysis of various ways of dealing with the problem.

Read the article
NDepend: How to not display 'tier' assemblies in dependency graph?

- by Edward Buatois

I was able to do this in an earlier version of nDepend by going to tools-options and setting which assemblies would be part of the analysis (and ignore the rest). The latest version of the trial version of nDepend lets me set it, but it seems to ignore the setting and always analyze all assemblies whether I want it to or not. I tried to delete the "tier" assemblies by moving them over to the "application assemblies" list, but when I delete them out of there, they just get added back to the "tier" list, which I can't ignore. I don't want my dependency graph to contain assemblies like "system," "system.xml," and "system.serialization!" I want only MY assemblies in the dependency graph! Or is that a paid-version feature now? Is there a way to do what I'm talking about?

Read the article
Hadoop on windows server

- by Luca Martinetti

Hello, I'm thinking about using hadoop to process large text files on my existing windows 2003 servers (about 10 quad core machines with 16gb of RAM) The questions are: Is there any good tutorial on how to configure an hadoop cluster on windows? What are the requirements? java + cygwin + sshd ? Anything else? HDFS, does it play nice on windows? I'd like to use hadoop in streaming mode. Any advice, tool or trick to develop my own mapper / reducers in c#? What do you use for submitting and monitoring the jobs? Thanks

Read the article
MYSQL KEY-VALUE PAIR Viability

- by Amit

Hi, I am new to mysql and I am looking for some answers to the follwoing questions: a) Can mysql community server can be leveraged for a key-value pair type database.?? b) Which mysql engine is best suited for a key-value pair type database ?? c) Is Mysql cluster a must for horizontal scaling of key-value based datastore or can it be acheived using MySQL replication?? d) Are there any docs or whitepapers for best practices when implementiing a kv datastore on mysql?? e) Are there any known big implementations other that friendfeed doing kv pair using MYSQL?? Would really appreciate some advise from all you Mysql gurus out there !! Thanks In Advance, Amit

Read the article
Essential skills of a Data Scientist

- by harshsinghal

I would like to know more about the relevant skills in the arsenal of a Data Scientist, and with new technologies coming in every day, how one picks and chooses the essentials. A few ideas germane to this discussion: Knowing SQL and the use of a DB such as MySQL, PostgreSQL was great till the advent of NoSql and non-relational databases. MongoDB, CouchDB etc. are becoming popular to work with web-scale data. Knowing a stats tool like R is enough for analysis, but to create applications one may need to add Java, Python, and such others to the list. Data now comes in the form of text, urls, multi-media to name a few, and there are different paradigms associated with their manipulation. What about cluster computing, parallel computing, the cloud, Amazon EC2, Hadoop ? OLS Regression now has Artificial Neural Networks, Random Forests and other relatively exotic machine learning/data mining algos. for company Thoughts?

Read the article
Getting clusters of rows close together in time

- by Mike

I have a table basically like so ID | ItemID | Start | End | --------------------------------------------------------------- 1 234 10/20/09 8:34:22 10/20/09 8:35:10 2 274 10/20/09 8:35:30 10/20/09 8:36:27 3 272 10/21/09 12:15:00 10/21/09 12:17:00 4 112 10/21/09 12:20:14 10/21/09 12:21:21 5 15 10/21/09 12:22:39 10/21/09 12:24:15 There are two "clusters" of entries here, 1-2 and 3-5 separated by a gap in time, specifically 30 minutes is what I'm interested in. What I would like is the first and last rows of the cluster of entries. This is fairly easy to achieve by retrieving all the rows and looping through them in order of start time, but I'd like to have it in SQL if possible. I'm using SQL Server 2008, thanks.

Read the article
Garbage collection of Strings returned from C# method calls in ascx pages

- by Icarus

Hi, For a web application developed on ASP.NET, we are finding that for user control files (ascx) we are returning long strings as a result of method calls. These are embedded in the ascx pages using the special tags <% %> When performing memory dump analysis for the application, we find that many of those strings are not being garbage collected. Also, the ascx pages are compiled to temporary DLLs and they are held in memory. Is this responsible for causing the long strings to remain in memory and not be garbage collected ? Note : The strings are larger than 85K in size.

Read the article
Does anyone know a better alternative to MS Excel's Solver?

- by tundal45

My company has to crunch a lot of data and part of the process involves running the solver and plotting a graph through resulting data points. Obviously there is a lot of copy and paste involved and the whole process is shaky, error prone and all round cluster-fudge. I was wondering if there was an alternative to the solver that can be used so that even if we have to use excel to plot the final graph, there will be a lot less data that needs to be copied and pasted back and forth. It would be great especially if the tool could be easily integrated into a .NET application but I am open to suggestions that may require a little bit of code-fu to get this to work. Thanks!

Read the article
Parallelism in Python

- by fmark

What are the options for achieving parallelism in Python? I want to perform a bunch of CPU bound calculations over some very large rasters, and would like to parallelise them. Coming from a C background, I am familiar with three approaches to parallelism: Message passing processes, possibly distributed across a cluster, e.g. MPI. Explicit shared memory parallelism, either using pthreads or fork(), pipe(), et. al Implicit shared memory parallelism, using OpenMP. Deciding on an approach to use is an exercise in trade-offs. In Python, what approaches are available and what are their characteristics? Is there a clusterable MPI clone? What are the preferred ways of achieving shared memory parallelism? I have heard reference to problems with the GIL, as well as references to tasklets. In short, what do I need to know about the different parallelization strategies in Python before choosing between them?

Read the article
infoWindow on MarkerClusterer

- by vishwanath

I need infoWindow to be opened instead of zooming in map, when clicking on the ClusterMarker. I am using Gmaps util library MarkerClusterer for creating cluster of markers. I tried changing following line in markerclusterer.js ClusterMarker_.prototype = new GOverlay(); with ClusterMarker_.prototype = new GMarker(); so that I can get the openInfoWindow() function in the clustermarker, but that didnt worked out. Got some error. If possible, Please suggest solution so that this can be done with MarkerClusterer. Or else any other library which will be able to do this. Any help will be appreciated.

Read the article
side effect gotchas in python/numpy? horror stories and narrow escapes wanted

- by shabbychef

I am considering moving from Matlab to Python/numpy for data analysis and numerical simulations. I have used Matlab (and SML-NJ) for years, and am very comfortable in the functional environment without side effects (barring I/O), but am a little reluctant about the side effects in Python. Can people share their favorite gotchas regarding side effects, and if possible, how they got around them? As an example, I was a bit surprised when I tried the following code in Python: lofls = [[]] * 4 #an accident waiting to happen! lofls[0].append(7) #not what I was expecting... print lofls #gives [[7], [7], [7], [7]] #instead, I should have done this (I think) lofls = [[] for x in range(4)] lofls[0].append(7) #only appends to the first list print lofls #gives [[7], [], [], []] thanks in advance

Read the article

< Previous Page | 120 121 122 123 124 125 126 127 128 129 130 131 | Next Page >