cluster analysis - Page 66

SSAS: distribution of measures over percentage

- by Alex

Hi there, I am running a SSAS cube that stores facts of HTTP requests. The is a column "Time Taken" that stores the milliseconds a particular HTTP request took. Like... RequestID Time Taken -------------------------- 1 0 2 10 3 20 4 20 5 2000 I want to provide a report through Excel that shows the distribution of those timings by percentage of requests. A statement like "90% of all requests took less than 20millisecond". Analysis: 100% <2000 80% <20 60% <20 40% <10 20% <=0 I am pretty much lost what would be the right approach to design aggregations, calculations etc. to offer this analysis through Excel. Any ideas? Thanks, Alex

Read the article

Using clang to analyze C++ code

- by aneccodeal

We want to do some fairly simple analysis of user's C++ code and then use that information to instrument their code (basically regen their code with a bit of instrumentation code) so that the user can run a dynamic analysis of their code and get stats on things like ranges of values of certain numeric types. clang should be able to handle enough C++ now to handle the kind of code our users would be throwing at it - and since clang's C++ coverage is continuously improving by the time we're done it'll be even better. So how does one go about using clang like this as a standalone parser? We're thinking we could just generate an AST and then walk it looking for objects of the classes we're interested in tracking. Would be interested in hearing from others who are using clang without LLVM.

Read the article

RequestDispatcher forward between Tomcat instances

- by MontyBongo

I have a scenario where I have single entry point Servlet and further Servlets that requests are forwarded to that undertake heavy processing. I am looking at options to distribute this load and I would like to know if it is possible using Tomcat or another platform to forward requests between Servlets sitting on different servers using a cluster type configuration or similar. I have found some documentation on clustering Servlets and Tomcat but none indicate if Servlet request forwarding is possible from what I can see. http://java.sun.com/blueprints/guidelines/designing_enterprise_applications_2e/web-tier/web-tier5.html http://tomcat.apache.org/tomcat-5.5-doc/cluster-howto.html

Read the article

Python k-means algorithm

- by Eeyore

I am looking for Python implementation of k-means algorithm with examples to cluster and cache my database of coordinates.

Read the article

Is there a log file analyzer for log4j files?

- by Juha Syrjälä

I am looking for some kind of analyzer tool for log files generated by log4j files. I am looking something more advanced than grep? What are you using for log file analysis? I am looking for following kinds of features: The tool should tell me how many time a given log statement or a stack trace has occurred, preferably with support for some kinds of patterns (eg. number of log statements matching 'User [a-z]* logged in'). Breakdowns by log level (how many INFO, DEBUG lines) and by class that initiated the log message would be nice. Breakdown by date (how many log statements in given time period) What log lines occur commonly together? Support for several files since I am using log rolling Hot spot analysis: find if there is a some time period when there is unusually high number of log statements Either command-line or GUI are fine Open Source is preferred but I am also interested in commercial offerings My log4j configuration uses org.apache.log4j.PatternLayout with pattern %d %p %c - %m%n but that could be adapted for analyzer tool.

Read the article

<100% Test coverage - best practices in selecting test areas

- by Paul Nathan

Suppose you're working on a project and the time/money budget does not allow 100% coverage of all code/paths. It then follows that some critical subset of your code needs to be tested. Clearly a 'gut-check' approach can be used to test the system, where intuition and manual analysis can produce some sort of test coverage that will be 'ok'. However, I'm presuming that there are best practices/approaches/processes that identify critical elements up to some threshold and let you focus your test elements on those blocks. For example, one popular process for identifying failures in manufacturing is Failure Mode and Effects Analysis. I'm looking for a process(es) to identify critical testing blocks in software.

Read the article

build .pyc prob

- by Apache

hi experts, i build .py as follow python /root/pyinstaller-1.4/Makespec.py test.py then python /root/pyinstaller-1.4/Build.py test.spec this working fine then i test to build with my .pyc as follow python /root/pyinstaller-1.4/Makespec.py test.pyc then python /root/pyinstaller-1.4/Build.pyc test.spec but its generating error as follow checking Analysis building because inputs changed running Analysis outAnalysis0.toc Analyzing: /root/pyinstaller-1.4/support/_mountzlib.py Analyzing: /root/pyinstaller-1.4/support/useUnicode.py Analyzing: test.pyc Traceback (most recent call last): File "/root/pyinstaller-1.4/Build.py", line 1160, in main(args[0], configfilename=opts.configfile) File "/root/pyinstaller-1.4/Build.py", line 1148, in main build(specfile) File "/root/pyinstaller-1.4/Build.py", line 1111, in build execfile(spec) File "test.spec", line 3, in pathex=['/root/test']) File "/root/pyinstaller-1.4/Build.py", line 245, in init self.postinit() File "/root/pyinstaller-1.4/Build.py", line 196, in postinit self.assemble() File "/root/pyinstaller-1.4/Build.py", line 314, in assemble analyzer.analyze_script(script) File "/root/pyinstaller-1.4/mf.py", line 559, in analyze_script co = compile(string.replace(stuff, "\r\n", "\n"), fnm, 'exec') TypeError: compile() expected string without null bytes why this error occur, cannot we build using .pyc, or there is other way to build it,

Read the article

OpenCV's cvKMeans2 - chosing clusters

- by Goffrey

Hi, I'm using cvKMeans2 for clustering, but I'm not sure, how it works in general - the part of choosing clusters. I thought that it set the first positions of clusters from given samples. So it means that in the end of clustering process would every cluster has at least one sample - in the output array of cluster labels will be full range of numbers (for 100 clusters - numbers 0 to 99). But as I checked output labels, I realised that some labels weren't used at all and only some were used. So, does anyone know, how it works? Or how should I use the parameters of cvKMeans2 to do what I want (cause I'm not sure if I use them right)? I'm using cvKMeans2 function also with optional parameters: cvKMeans2(descriptorMat, N_CLUSTERS, clusterLabels, cvTermCriteria( CV_TERMCRIT_EPS+CV_TERMCRIT_ITER, 10, 1.0), 1, 0, 0, clusterCenters, 0) Thanks for any advices!

Read the article

Managed language for scientific computing software

- by heisen

Scientific computing is algorithm intensive and can also be data intensive. It often needs to use a lot of memory to run analysis and release it before continuing with the next. Sometime it also uses memory pool to recycle memory for each analysis. Managed language is interesting here because it can allow the developer to concentrate on the application logic. Since it might need to deal with huge dataset, performance is important too. But how can we control memory and performance with managed language?

Read the article

The best way to organize WPF projects

- by Mike

Hello everybody, I've started recently to develop a new software in WPF, and I still don't know which is the best way to organize the application, to be more productive with Visual Studio and Expression Blend. I noticed 2 annoying things I'd like to solve: I'm using Code Contracts in my projects, and when I run my project with Expression Blend, it launches the static analysis of the code. How can I stop that? Which configuration of the project does Blend use by default? I've tried to disable Code Contracts in a new configuration. It works in VS as the static analysis is not launched, but it has no effects in Blend. I've thinked about splitting the Windows Application in 2 parts: the first one containing the views of the WPF (app.exe) and the second one being the core of the project, with the logic code (app.core.dll), and I would just open the former project in Blend. Any thoughts about that? Thanks in advance Mike

Read the article

In MySQL, what is the most effective query design for joining large tables with many to many relatio

- by lighthouse65

In our application, we collect data on automotive engine performance -- basically source data on engine performance based on the engine type, the vehicle running it and the engine design. Currently, the basis for new row inserts is an engine on-off period; we monitor performance variables based on a change in engine state from active to inactive and vice versa. The related engineState table looks like this: +---------+-----------+---------------+---------------------+---------------------+-----------------+ | vehicle | engine | engine_state | state_start_time | state_end_time | engine_variable | +---------+-----------+---------------+---------------------+---------------------+-----------------+ | 080025 | E01 | active | 2008-01-24 16:19:15 | 2008-01-24 16:24:45 | 720 | | 080028 | E02 | inactive | 2008-01-24 16:19:25 | 2008-01-24 16:22:17 | 304 | +---------+-----------+---------------+---------------------+---------------------+-----------------+ For a specific analysis, we would like to analyze table content based on a row granularity of minutes, rather than the current basis of active / inactive engine state. For this, we are thinking of creating a simple productionMinute table with a row for each minute in the period we are analyzing and joining the productionMinute and engineEvent tables on the date-time columns in each table. So if our period of analysis is from 2009-12-01 to 2010-02-28, we would create a new table with 129,600 rows, one for each minute of each day for that three-month period. The first few rows of the productionMinute table: +---------------------+ | production_minute | +---------------------+ | 2009-12-01 00:00 | | 2009-12-01 00:01 | | 2009-12-01 00:02 | | 2009-12-01 00:03 | +---------------------+ The join between the tables would be engineState AS es LEFT JOIN productionMinute AS pm ON es.state_start_time <= pm.production_minute AND pm.production_minute <= es.event_end_time. This join, however, brings up multiple environmental issues: The engineState table has 5 million rows and the productionMinute table has 130,000 rows When an engineState row spans more than one minute (i.e. the difference between es.state_start_time and es.state_end_time is greater than one minute), as is the case in the example above, there are multiple productionMinute table rows that join to a single engineState table row When there is more than one engine in operation during any given minute, also as per the example above, multiple engineState table rows join to a single productionMinute row In testing our logic and using only a small table extract (one day rather than 3 months, for the productionMinute table) the query takes over an hour to generate. In researching this item in order to improve performance so that it would be feasible to query three months of data, our thoughts were to create a temporary table from the engineEvent one, eliminating any table data that is not critical for the analysis, and joining the temporary table to the productionMinute table. We are also planning on experimenting with different joins -- specifically an inner join -- to see if that would improve performance. What is the best query design for joining tables with the many:many relationship between the join predicates as outlined above? What is the best join type (left / right, inner)?

Read the article

ASP.NET MVC Validation of ViewState MAC failed

- by Kevin Pang

After publishing a new build of my ASP.NET MVC web application, I often see this exception thrown when browsing to the site: System.Web.Mvc.HttpAntiForgeryException: A required anti-forgery token was not supplied or was invalid. --- System.Web.HttpException: Validation of viewstate MAC failed. If this application is hosted by a Web Farm or cluster, ensure that configuration specifies the same validationKey and validation algorithm. AutoGenerate cannot be used in a cluster. --- System.Web.UI.ViewStateException: Invalid viewstate. This exception will continue to occur on each page I visit in my web application until I close out of Firefox. After reopening Firefox, the site works perfectly. Any idea what's going on? Additional notes: I am not using any ASP.NET web controls (there are no instances of runat="server" in my application) If I take out the <%= Html.AntiForgeryToken % from my pages, this problem seems to go away

Read the article

Dynamic function docstring

- by Tom Aldcroft

I'd like to write a python function that has a dynamically created docstring. In essence for a function func() I want func.__doc__ to be a descriptor that calls a custom __get__ function create the docstring on request. Then help(func) should return the dynamically generated docstring. The context here is to write a python package wrapping a large number of command line tools in an existing analysis package. Each tool becomes a similarly named module function (created via function factory and inserted into the module namespace), with the function documentation and interface arguments dynamically generated via the analysis package.

Read the article

geographical deployment Vs geo load balancing SharePoint 2010

- by vrajaraman

we have a company wide SharePoint portals planned for few thousand users. since the users are distributed among different countries and their applications (hosted in sharepoint) We would like to consider geo deployment Vs geo load balancing. Please share your inputs. We are aware of this, Geo SharePoint Cluster facilitates - Farms at Central and other sites , db into regional. 2 db cluster - syncing using logshipping or SAN sync or SQL 2008 features like database mirroing Vs Loading balancing using URL and some 3rd party. all farm,sites,db centralised. benefits expecting. 1 High availability. 2.diaster recovering management. 3.maintenance hope i miss some of the points to be covered

Read the article

Clustering [assessment] algorithm with distance matrix as an input

- by Max

Can anyone suggest some clustering algorithm which can work with distance matrix as an input? Or the algorithm which can assess the "goodness" of the clustering also based on the distance matrix? At this moment I'm using a modification of Kruskal's algorithm (http://en.wikipedia.org/wiki/Kruskal%27s_algorithm) to split data into two clusters. It has a problem though. When the data has no distinct clusters the algorithm will still create two clusters with one cluster containing one element and the other containing all the rest. In this case I would rather have one cluster containing all the elements and another one which is empty. Are there any algorithms which are capable of doing this type of clustering? Are there any algorithms which can estimate how well the clustering was done or even better how many clusters are there in the data? The algorithms should work only with distance(similarity) matrices as an input.

Read the article

C# - Data Clustering approach

- by Brett

Hi all, I am writing a program in C# in which I have a set of 200 points displayed on an image. However, the points tend to cluster in various regions, and I am looking to find a way to "cluster." In other words, maybe draw a circle/ellipse around the clustered points. Has anyone seen any way to do this? I have heard about K-means clustering, but I am not sure how to implement it in C#. Any favorite implementations out there? Cheers, Brett

Read the article

Tips on deploying Ror

- by notnoop

How can I go about deploying a Rails app on a cluster of Amazon EC2 servers? Any recommended guides? I maintain a RoR app (currently hosted on Heroku) that uses a DB and DelayedJobs). The app has a large footprint, and needs to be distributed on a cluster most likely. Any tips would be appreciated. Are there Amazon AMIs that replicate some of Heroku's features (especially DJ)? P.S. I'm quite a Ruby newbie.

Read the article

AJAX with Web services and ASP.NET SessionState

- by needhelp1

We have an application which uses ScriptManager to generate a client-side proxy which makes AJAX calls to web services. The web services being invoked live in a separate appDomain(separate cluster altogether). The problem is that our application uses a State server for storing session. I want the web services to be able to access session also. First off, does anyone see anything wrong with the client making web service calls to a separate cluster(we're hoping this would be a better approach for scalability)? I was thinking that possibly anytime there is an update to the session dictionary in one appDomain, automatically update the session in the other appDomain also(referring to the web service appDomain, don't know how to do this, only theoretical). What do others think? Thanks!

Read the article

Simple database design and LINQ

- by Anders Svensson

I have very little experience designing databases, and now I want to create a very simple database that does the same thing I have previously had in xml. Here's the xml: <services> <service type="writing"> <small>125</small> <medium>100</medium> <large>60</large> <xlarge>30</xlarge> </service> <service type="analysis"> <small>56</small> <medium>104</medium> <large>200</large> <xlarge>250</xlarge> </service> </services> Now, I wanted to create the same thing in a SQL database, and started doing this ( hope this formats ok, but you'll get the gist, four columns and two rows): > ServiceType Small Medium Large > > Writing 125 100 60 > > Analysis 56 104 200 This didn't work too well, since I then wanted to use LINQ to select, say, the Large value for Writing (60). But I couldn't use LINQ for this (as far as I know) and use a variable for the size (see parameters in the method below). I could only do that if I had a column like "Size" where Small, Medium, and Large would be the values. But that doesn't feel right either, because then I would get several rows with ServiceType = Writing (3 in this case, one for each size), and the same for Analysis. And if I were to add more servicetypes I would have to do the same. Simply repetitive... Is there any smart way to do this using relationships or something? Using the second design above (although not good), I could use the following LINQ to select a value with parameters sent to the method: protected int GetHourRateDB(string serviceType, Size size) { CalculatorLinqDataContext context = new CalculatorLinqDataContext(); var data = (from calculatorData in context.CalculatorDatas where calculatorData.Service == serviceType && calculatorData.Size == size.ToString() select calculatorData).Single(); return data.Hours; } But if there is another better design, could you please also describe how to do the same selection using LINQ with that design? Please keep in mind that I am a rookie at database design, so please be as explicit and pedagogical as possible :-) Thanks! Anders

Read the article

Segmenting a double array of labels

- by Ami

The Problem: I have a large double array populated with various labels. Each element (cell) in the double array contains a set of labels and some elements in the double array may be empty. I need an algorithm to cluster elements in the double array into discrete segments. A segment is defined as a set of pixels that are adjacent within the double array and one label that all those pixels in the segment have in common. (Diagonal adjacency doesn't count and I'm not clustering empty cells). |-------|-------|------| | Jane | Joe | | | Jack | Jane | | |-------|-------|------| | Jane | Jane | | | | Joe | | |-------|-------|------| | | Jack | Jane | | | Joe | | |-------|-------|------| In the above arrangement of labels distributed over nine elements, the largest cluster is the “Jane” cluster occupying the four upper left cells. What I've Considered: I've considered iterating through every label of every cell in the double array and testing to see if the cell-label combination under inspection can be associated with a preexisting segment. If the element under inspection cannot be associated with a preexisting segment it becomes the first member of a new segment. If the label/cell combination can be associated with a preexisting segment it associates. Of course, to make this method reasonable I'd have to implement an elaborate hashing system. I'd have to keep track of all the cell-label combinations that stand adjacent to preexisting segments and are in the path of the incrementing indices that are iterating through the double array. This hash method would avoid having to iterate through every pixel in every preexisting segment to find an adjacency. Why I Don't Like it: As is, the above algorithm doesn't take into consideration the case where an element in the double array can be associated with two unique segments, one in the horizontal direction and one in the vertical direction. To handle these cases properly, I would need to implement a test for this specific case and then implement a method that will both associate the element under inspection with a segment and then concatenate the two adjacent identical segments. On the whole, this method and the intricate hashing system that it would require feels very inelegant. Additionally, I really only care about finding the large segments in the double array and I'm much more concerned with the speed of this algorithm than with the accuracy of the segmentation, so I'm looking for a better way. I assume there is some stochastic method for doing this that I haven't thought of. Any suggestions?

Read the article

Finding C++ static initialization order problems

- by Fred Larson

We've run into some problems with the static initialization order fiasco, and I'm looking for ways to comb through a whole lot of code to find possible occurrences. Any suggestions on how to do this efficiently? Edit: I'm getting some good answers on how to SOLVE the static initialization order problem, but that's not really my question. I'd like to know how to FIND objects that are subject to this problem. Evan's answer seems to be the best so far in this regard; I don't think we can use valgrind, but we may have memory analysis tools that could perform a similar function. That would catch problems only where the initialization order is wrong for a given build, and the order can change with each build. Perhaps there's a static analysis tool that would catch this. Our platform is IBM XLC/C++ compiler running on AIX.

Read the article

How do you measure latency in low-latency environments?

- by Ajaxx

Here's the setup... Your system is receiving a stream of data that contains discrete messages (usually between 32-128 bytes per message). As part of your processing pipeline, each message passes through two physically separate applications which exchange the data using a low-latency approach (such as messaging over UDP) or RDMA and finally to a client via the same mechanism. Assuming you can inject yourself at any level, including wire protocol analysis, what tools and/or techniques would you use to measure the latency of your system. As part of this, I'm assuming that every message that is delivered to the system results in a corresponding (though not equivalent) message being pushed through the system and delivered to the client. The only tool that I've seen on the market like this is TS-Associates TipOff. I'm sure that with the right access you could probably measure the same information using a wire analysis tool (ala wireshark) and the right dissectors, but is this the right approach or are there any commodity solutions that I can use?

Read the article

complex mysql query problem

- by Scarface

Hey guys I have a query that selects data and organizes but not in the correct order. What I want to do is select all the comments for a user in that week and sort it by each topic, then sort the cluster by the latest timestamp of each comment in their respective cluster. My current query selects the right data, but in seemingly random order. Does anyone have any ideas? select * from ( SELECT topic.topic_title, topic.topic_id FROM comments JOIN topic ON topic.topic_id=comments.topic_id WHERE comments.user='$user' AND comments.timestamp>$week order by comments.timestamp desc) derived_table group by topic_id

Read the article

Can I ask Postgresql to ignore errors within a transaction

- by fmark

I use Postgresql with the PostGIS extensions for ad-hoc spatial analysis. I generally construct and issue SQL queries by hand from within psql. I always wrap an analysis session within a transaction, so if I issue a destructive query I can roll it back. However, when I issue a query that contains an error, it cancels the transaction. Any further queries elicit the following warning: ERROR: current transaction is aborted, commands ignored until end of transaction block Is there a way I can turn this behaviour off? It is tiresome to rollback the transaction and rerun previous queries every time I make a typo.

Read the article

Get highest frequency terms from Lucene index

- by Julia

Hello! i need to extract terms with highest frequencies from several lucene indexes, to use them for some semantic analysis. So, I want to get maybe top 30 most occuring terms(still did not decide on threshold, i will analyze results) and their per-index counts. I am aware that I might lose some precision because of potentionally dropped duplicates, but for now, lets say i am ok with that. So for the proposed solutions, (needless to say maybe) speed is not important, since I would do static analysis, I would put accent on simplicity of implementation because im not so skilled with Lucene (not the programming guru too :/ ) and cant wrap my mind around many concepts of it.. I can not find any code samples from something similar, so all concrete advices (code, pseudocode, links to code samples...) I will apretiate very much!!! Thank you!

Search Results

Search found 4291 results on 172 pages for 'cluster analysis'.

Page 66/172 | < Previous Page | 62 63 64 65 66 67 68 69 70 71 72 73 | Next Page >

- by Alex

- by aneccodeal

- by MontyBongo

- by Eeyore

- by Juha Syrjälä

- by Paul Nathan

- by Apache

- by Goffrey

- by heisen

- by Mike

- by lighthouse65

- by Kevin Pang

- by Tom Aldcroft

- by vrajaraman

- by Max

- by Brett

- by notnoop

- by needhelp1

- by Anders Svensson

- by Ami

- by Fred Larson

- by Ajaxx

- by Scarface

- by fmark

- by Julia

< Previous Page | 62 63 64 65 66 67 68 69 70 71 72 73 | Next Page >