Search Results

Search found 4291 results on 172 pages for 'cluster analysis'.

Page 130/172 | < Previous Page | 126 127 128 129 130 131 132 133 134 135 136 137 | Next Page >

NOT LIKE not working on comparison to a column

- by rodling

Data is fairly large and takes few minutes to run it every time, so its taking a lot of time debugging this problem. When I run like concat('%',T.item,'%') on smaller data it seems to identify items properly. However, when I run it on the main DB (the code shown), it still shows many(maybe even all) of the exceptions. EDIT: it seems when i add NOT it stops identifying items select distinct T.comment from (select comment, source, item from data, non_informative where ticker != "O" and source != 7 and source != 6) as T where T.comment not like concat('%',T.item,'%') order by T.comment; comment and source are in data, item is in non_informative Some items from T.item: 'Stock Analysis -', '#InsideTrades', 'IIROC Trade' Example comment which should be removed '#InsideTrades #4 | MACNAB CRAIG (Director,Officer,Chief Executive Officer): Filed Form 4 for $NNN (NATIONAL RETA' Can't seem to figure out it why shows all the items

Read the article
How should I capture clickstream data?

- by editor

I'd like to start using clickstream analysis to improve a dynamic site's user experience. I'd like to rule out two options: parameterizing URLs (index.php?src=http://www.example.com) and immediate database logging. The former makes pretty ugly URLs and isn't great for SEO and the latter might slow down page render when there are lots of concurrent users. Assuming these aren't viable options, I think I'm left with doing an asynchronous POST to a server side script that runs a database query and returns a 204 (no data) response. Is this the best option for capturing clickstream data?

Read the article
Rails: How can I log all requests which take more than 4s to execute?

- by Fedyashev Nikita

I have a web app hosted in a cloud environment which can be expanded to multiple web-nodes to serve higher load. What I need to do is to catch this situation when we get more and more HTTP requests (assets are stored remotely). How can I do that? The problem I see from this point of view is that if we have more requests than mongrel cluster can handle then the queue will grow. And in our Rails app we can only count only after mongrel will receive the request from balancer.. Any recommendations?

Read the article
python parallel computing: split keyspace to give each node a range to work on

- by MatToufoutu

My question is rather complicated for me to explain, as i'm not really good at maths, but i'll try to be as clear as possible. I'm trying to code a cluster in python, which will generate words given a charset (i.e. with lowercase: aaaa, aaab, aaac, ..., zzzz) and make various operations on them. I'm searching how to calculate, given the charset and the number of nodes, what range each node should work on (i.e.: node1: aaaa-azzz, node2: baaa-czzz, node3: daaa-ezzz, ...). Is it possible to make an algorithm that could compute this, and if it is, how could i implement this in python? I really don't know how to do that, so any help would be much appreciated

Read the article
Maintaining traceability up-to-date as project evolves

- by Catalin Piti?

During various projects, I needed to make sure that the use case model I developed during the analysis phase is covering the requirements of the project. For that, I was able to have some degree of traceability between requirement statements (uniquely identified) and use cases (also uniquely identified). In some cases, enabling traceability implied some additional effort that I considered (and later proved) to be a good investment. Now, the biggest problem I faced was to maintain this traceability later, when things started to change (as a result of change requests, or as a result of use case changes). Any ideas of best practices for traceability maintenance? (It can apply to other items in the project - e.g. use cases and test cases, or requirements and acceptance test cases) Later edit Tools might help, but they can't detect gaps or errors in traceability. Navigation... maybe, but no warranty that the traceability is up-to-date or correct after applying the changes.

Read the article
Use C function in C++ program; "multiply-defined" error

- by eom

I am trying to use this code for the Porter stemming algorithm in a C++ program I've already written. I followed the instructions near the end of the file for using the code as a separate module. I created a file, stem.c, that ends after the definition and has extern int stem(char * p, int i, int j) ... It worked fine in Xcode but it does not work for me on Unix with gcc 4.1.1--strange because usually I have no problem moving between the two. I get the error ld: fatal: symbol `stem(char*, int, int)' is multiply-defined: (file /var/tmp//ccrWWlnb.o type=FUNC; file /var/tmp//cc6rUXka.o type=FUNC); ld: fatal: File processing errors. No output written to cluster I've looked online and it seems like there are many things I could have wrong, but I'm not sure what combination of a header file, extern "C", etc. would work.

Read the article
DBA's say no to SQL Server DTC?

- by NabilS

I am trying to get our DBA's to enable DTC on a cluster of SQL Server 2005. Unfortunately they keep refusing. Their argument that they would need to set up a dedicated host for DTC (Could take months!!) as it is not a matter of ticking a few boxes. Is this true? How intrusive is DTC on a shared environment such as a SQL farm. Do I have an argument against this? Thanks

Read the article
Python - a clean approach to this problem?

- by Seafoid

Hi, I am having trouble picking the best data structure for solving a problem. The problem is as below: I have a nested list of identity codes where the sublists are of varying length. li = [['abc', 'ghi', 'lmn'], ['kop'], ['hgi', 'ghy']] I have a file with two entries on each line; an identity code and a number. abc 2.93 ghi 3.87 lmn 5.96 Each sublist represents a cluster. I wish to select the i.d. from each sublist with the highest number associated with it, append that i.d. to a new list and ultimately write it to a new file. What data structure should the file with numbers be read in as? Also, how would you iterate over said data structure to return the i.d. with the highest number that matches the i.d. within a sublist? Thanks, S :-)

Read the article
Value of A.S. Degree in Programming

- by MiseryIndex

I am in a quite unusual family situation. For the next two years, I have to stay at home where the only post-secondary institution available is a community college. After two years, I will have to start earning a living. I do not really have any real-world programming experience to put on my résumé. I did some not-too-advanced work in PHP for family and friends, and I’m pretty sure that I want to program for a living. I have been working on an A.S. Degree in Computer Programming and Analysis since fall. My question regarding the degree is: is it worth anything to potential employers or am I just wasting my time? Is there a better way to spend the oncoming two years? If I could get an internship and some experience, would that hold more weight than a two-year degree without experience?

Read the article
User Interface. Multiple select with priority.

- by Andrew Florko

I'm designing user interface and want to ask your advises how to make it more user-friendly. Please tell any suggestions and if you have ever seen implementation of something familiar please share the link. University. There are 40+ specialities grouped into 5 faculties. User choose several he is interested in and than orders them by priority. For example I am interested in "programming microcontrollers", "system analysis" and "experimental physic". I must find them quickly in "programming faculty", select them and then order - what I prefer most and what I prefer less then others I select. Any ideas welcome :)

Read the article
How can I neatly clean my R workspace while preserving certain objects?

- by briandk

Suppose I'm messing about with some data by binding vectors together, as I'm wont to do on a lazy sunday afternoon. x <- rnorm(25, mean = 65, sd = 10) y <- rnorm(25, mean = 75, sd = 7) z <- 1:25 dd <- data.frame(mscore = x, vscore = y, caseid = z) I've now got my new dataframe dd, which is wonderful. But there's also still the detritus from my prior slicings and dicings: > ls() [1] "dd" "x" "y" "z" What's a simple way to clean up my workspace if I no longer need my "source" columns, but I want to keep the dataframe? That is, now that I'm done manipulating data I'd like to just have dd and none of the smaller variables that might inadvertently mask further analysis: > ls() [1] "dd" I feel like the solution must be of the form rm(ls[ -(dd) ]) or something, but I can't quite figure out how to say "please clean up everything BUT the following objects."

Read the article
Can not connect to SSIS access denied

- by Pramodtech

I am facing an issue while connecting to SSIS thru Mangament studio. I'm able to connect to SQL engine, Analysis services but not able to connect to SSIS. I use windows authentication. I tried steps given at http://msdn.microsoft.com/en-us/library/aa337083(SQL.90).aspx but no help. On one of the forum I saw that one needs to restart the MSDTC service, Do I need to do that? bcoz my SQL admin said I need to justify it by assuring that it doesn't affect aything else. Moreover we didn't find way to restart the service, where I can do that? please help. Thanks.

Read the article
Background subtracting in MATLAB

- by eiphyomin

I'm looking to do background subtracting on an image. I'm new to MATLAB and new to image processing/analysis, so sorry if any of this sounds stupid. 1) Other than imsubtract() are there other ways to do background subtracting (besides comparing one image to another)? 2) In the Math Works explanation for imsubtract() why do they make their structuring element a disk? This seems rather difficult so far because every time I try something, I end up not only subtracting the noisy background but also losing the parts of the image I want to look at!

Read the article
How does Contract.Exists add value?

- by Scott Bilas

I am just starting to learn about the code contracts library that comes standard with VS2010. One thing I am running into right away is what some of the contract clauses really mean. For example, how are these two statements different? Contract.Requires(!mycollection.Any(a => a.ID == newID)); Contract.Requires(!Contract.Exists(mycollection, a => a.ID == newID)); In other words, what does Contract.Exists do in practical purposes, either for a developer using my function, or for the static code analysis system?

Read the article
Detecting touch area on Android

- by HappyAppDeveloper

Is it possible to detect every pixel being touched? More specifically, when the user touches the screen, is it possible to track all the x-y coordinates of the cluster of points touched by the user? How can I tell the difference between when users are drawing with their thumb and when they are drawing with the tip of a finger? I would like to reflect the brush difference depending on how users touch the screen, and would also like to track x-y coordinates of all the pixels being touched over time. Thanks so much in advance for any help.

Read the article
c++ Sorting a vector based on values of other vector, or what's faster?

- by pollux

Hi, There are a couple of other posts about sorting a vector A based on values in another vector B. Most of the other answers tell to create a struct or a class to combine the values into one object and use std::sort. Though I'm curious about the performance of such solutions as I need to optimize code which implements bubble sort to sort these two vectors. I'm thinking to use a vector<pair<int,int>> and sort that. I'm working on a blob-tracking application (image analysis) where I try to match previously tracked blobs against newly detected blobs in video frames where I check each of the frames against a couple of previously tracked frames and of course the blobs I found in previous frames. I'm doing this at 60 times per second (speed of my webcam). Any advice on optimizing this is appreciated. The code I'm trying to optimize can be shown here: http://code.google.com/p/projectknave/source/browse/trunk/knaveAddons/ofxBlobTracker/ofCvBlobTracker.cpp?spec=svn313&r=313 Thanks

Read the article
Handling null values with PowerShell dates

- by Tim Ferrill

I'm working on a module to pull data from Oracle into a PowerShell data table, so I can automate some analysis and perform various actions based on the results. Everything seems to be working, and I'm casting columns into specific types based on the column type in Oracle. The problem I'm having has to do with null dates. I can't seem to find a good way to capture that a date column in Oracle has a null value. Is there any way to cast a [datetime] as null or empty?

Read the article
Writing Great Software

- by 01010011

Hi, I'm currently reading Head First's Object Oriented Analysis and Design. The book states that to write great software (i.e. software that is well-designed, well-coded, easy to maintain, reuse, and extend) you need to do three things: Firstly, make sure the software does everything the customer wants it to do Once step 1 is completed, apply Object Oriented principles and techniques to eliminate any duplicate code that might have slipped in Once steps 1 and 2 are complete, then apply design patterns to make sure the software is maintainable and reusable for years to come. My question is, do you follow these steps when developing great software? If not, what steps do you usually follow inorder to ensure it's well designed, well-coded, easy to maintain, reuse and extend?

Read the article
Will an IO blocked process show 100% CPU utilization in 'top' output?

- by Alex Stoddard

I have an analysis that can be parallelized over a different number of processes. It is expected that things will be both IO and CPU intensive (very high throughput short-read DNA alignment if anyone is curious.) The system running this is a 48 core linux server. The question is how to determine the optimum number of processes such that total throughput is maximized. At some point the processes will presumably become IO bound such that adding more processes will be of no benefit and possibly detrimental. Can I tell from standard system monitoring tools when that point has been reached? Would the output of top (or maybe a different tool) enable me to distinguish between a IO bound and CPU bound process? I am suspicious that a process blocked on IO might still show 100% CPU utilization.

Read the article
Efficient job progress update in web application

- by Endru6

Hi, Creating a web application (Django in my case, but I think the question is more general) that is administrating a cluster of workers doing queued jobs, there is a need to track each jobs progress. When I've done it using database UPDATE (PostgreSQL in this case), it severely hits the database performance, because each UPDATE creates a new row in a table, and in my case only vacuuming DB removes obsolete rows. Having 30 jobs running and reporting progress every 1 minute DB may require vacuuming (and it means huge slow downs on a front end side for all the employees working with the system) every 10 days. Because the progress information isn't critical, ie. it doesn't have to be persistent, how would you do the progress updates from jobs without using an overhead database implies? There are 30 worker servers, each doing 1 or 2 jobs simultaneously, 1 front end server which serves a web application to users, and 1 database server.

Read the article
executorservice to read data from database in chuncks and run process on them

- by TazMan

I'm trying to write a process that would read data from a database and upload it onto a cloud datastore. How can I decide the partition strategy of the data? I want to query the table in chunks and process each chunk in 10 threads. Each thread basically will send the data to an individual node on a 10 node cluster on the cloud.. Where in the below multi threading code will the dataquery to extract and send 10 concurrent requests for uploading data to cloud would be? public class Caller { public static void main(String[] args) { ExecutorService executor = Executors.newFixedThreadPool(10); for (int i = 0; i < 10; i++) { Runnable worker = new DomainCDCProcessor(i); executor.execute(worker); } executor.shutdown(); while (!executor.isTerminated()) { } System.out.println("Finished all threads"); } }

Read the article
what's the meaning of ~0 in cpp or c

- by rima

Hi what's the meaning of ~0 in this code???? somebody can analysis this code for me? unsigned int Order(unsigned int maxPeriod = ~0) const { Point r = *this; unsigned int n = 0; while( r.x_ != 0 && r.y_ != 0 ) { ++n; r += *this; if ( n > maxPeriod ) break; } return n; } please help me soon....

Read the article
Too many local variables in ASP.NET

- by Yongwei Xing

Hi all I wrote a control, and use a tool to do the code analysis. There is a test that I didn't pass. Avoid excessive locals, http://msdn.microsoft.com/library/ms182263(VS.90).aspx. In my CreateChildControls function, I built a big table with lots of field. I need to create a lot of TableRow and TableCell to construct the table. But these are not field or property of control. Thery are local variables in the function, which are created dynamically. Should I make these TableCells and TableRows as fields of the control? Or I just keep them as the local variables in the CreateChildControl function? Best Regards,

Read the article
Accessing data feed

- by racket99

I have a Java program on my desktop which displays financial data gleaned from the web. It is a 3rd party application. What I would like to do is intercept the data before it goes to the Java application and record it into a flat file for the purpose of later data analysis. Is this at all possible? I imagine the data are available and are entering my computer through some port which the Java app picks up and then displays. Help appreciated. Thanks

Read the article
Why Hadoop is tightly bound to linux?

- by user1676346

I am new with Hadoop. What are the specific reasons why Hadoop is so tightly bound with Linux, and the cluster it runs upon is homogeneous? I'm looking for really specific details that can tell me why Hadoop does not work well with windows, and if there are some libraries some specific scripts that are involved? My project is to deploy Hadoop without using Cygwin. I have already seen the article from Hayes Davis where he explained how to install Hadoop without Cygwin, but he said that there are some bugs. I might start from scratch to properly configure Hadoop on Windows, but if any one can explain what, specifically, are the reasons that Hadoop doesn't work well on windows that would be very helpful.

Read the article

< Previous Page | 126 127 128 129 130 131 132 133 134 135 136 137 | Next Page >