genetic algorithms - Page 69

Automatic tracking algorithm

- by nico

Hi everyone, I'm trying to write a simple tracking routine to track some points on a movie. Essentially I have a series of 100-frames-long movies, showing some bright spots on dark background. I have ~100-150 spots per frame, and they move over the course of the movie. I would like to track them, so I'm looking for some efficient (but possibly not overkilling to implement) routine to do that. A few more infos: the spots are a few (es. 5x5) pixels in size the movement are not big. A spot generally does not move more than 5-10 pixels from its original position. The movements are generally smooth. the "shape" of these spots is generally fixed, they don't grow or shrink BUT they become less bright as the movie progresses. the spots don't move in a particular direction. They can move right and then left and then right again the user will select a region around each spot and then this region will be tracked, so I do not need to automatically find the points. As the videos are b/w, I though I should rely on brigthness. For instance I thought I could move around the region and calculate the correlation of the region's area in the previous frame with that in the various positions in the next frame. I understand that this is a quite naïve solution, but do you think it may work? Does anyone know specific algorithms that do this? It doesn't need to be superfast, as long as it is accurate I'm happy. Thank you nico

Read the article

What does O(log n) mean exactly?

- by Andreas Grech

I am currently learning about Big O Notation running times and amortized times. I understand the notion of O(n) linear time, meaning that the size of the input affects the growth of the algorithm proportionally...and the same goes for, for example, quadratic time O(n2) etc..even algorithms, such as permutation generators, with O(n!) times, that grow by factorials. For example, the following function is O(n) because the algorithm grows in proportion to its input n: f(int n) { int i; for (i = 0; i < n; ++i) printf("%d", i); } Similarly, if there was a nested loop, the time would be O(n2). But what exactly is O(log n)? For example, what does it mean to say that the height of a complete binary tree is O(log n)? I do know (maybe not in great detail) what Logarithm is, in the sense that: log10 100 = 2, but I cannot understand how to identify a function with a logarithmic time.

Read the article

Why Stream/lazy val implementation using is faster than ListBuffer one

- by anrizal

I coded the following implementation of lazy sieve algorithms using Stream and lazy val below : def primes(): Stream[Int] = { lazy val ps = 2 #:: sieve(3) def sieve(p: Int): Stream[Int] = { p #:: sieve( Stream.from(p + 2, 2). find(i=> ps.takeWhile(j => j * j <= i). forall(i % _ > 0)).get) } ps } and the following implementation using (mutable) ListBuffer: import scala.collection.mutable.ListBuffer def primes(): Stream[Int] = { def sieve(p: Int, ps: ListBuffer[Int]): Stream[Int] = { p #:: { val nextprime = Stream.from(p + 2, 2). find(i=> ps.takeWhile(j => j * j <= i). forall(i % _ > 0)).get sieve(nextprime, ps += nextprime) } } sieve(3, ListBuffer(3))} When I did primes().takeWhile(_ < 1000000).size , the first implementation is 3 times faster than the second one. What's the explanation for this ? I edited the second version: it should have been sieve(3, ListBuffer(3)) instead of sieve(3, ListBuffer()) .

Read the article

Have I taken a wrong path in programming by being excessively worried about code elegance and style?

- by Ygam

I am in a major stump right now. I am a BSIT graduate, but I only started actual programming less than a year ago. I observed that I have the following attitude in programming: I tend to be more of a purist, scorning unelegant approaches to solving problems using code I tend to look at anything in a large scale, planning everything before I start coding, either in simple flowcharts or complex UML charts I have a really strong impulse on refactoring my code, even if I miss deadlines or prolong development times I am obsessed with good directory structures, file naming conventions, class, method, and variable naming conventions I tend to always want to study something new, even, as I said, at the cost of missing deadlines I tend to see software development as something to engineer, to architect; that is, seeing how things relate to each other and how blocks of code can interact (I am a huge fan of loose coupling) i.e the OOP thinking I tend to combine OOP and procedural coding whenever I see fit I want my code to execute fast (thus the elegant approaches and refactoring) This bothers me because I see my colleagues doing much better the other way around (aside from the fact that they started programming since our first year in college). By the other way around I mean, they fire up coding, gets the job done much faster because they don't have to really look at how clean their codes are or how elegant their algorithms are, they don't bother with OOP however big their projects are, they mostly use web APIs, piece them together and voila! Working code! CLients are happy, they get paid fast, at the expense of a really unmaintainable or hard-to-read code that lacks structure and conventions, or slow executions of certain actions (which the common reasoning against would be that internet connections are much faster these days, hardware is more powerful). The excuse I often receive is clients don't care about how you write the code, but they do care about how long you deliver it. If it works then all is good. Now, did my "purist" approach to programming may have been the wrong way to start programming? Should I just dump these purist concepts and just code the hell up because I have seen it: clients don't really care how beautifully coded it is?

Read the article

How to set WS-SecurityPolicy in an inbound CXF service in Mule?

- by Brakara

When configuring the service for handling UsernameToken and signatures, it's setup like this: <service name="serviceName"> <inbound> <cxf:inbound-endpoint address="someUrl" protocolConnector="httpsConnector" > <cxf:inInterceptors> <spring:bean class="org.apache.cxf.binding.soap.saaj.SAAJInInterceptor" /> <spring:bean class="org.apache.cxf.ws.security.wss4j.WSS4JInInterceptor"> <spring:constructor-arg> <spring:map> <spring:entry key="action" value="UsernameToken Timestamp Signature" /> <spring:entry key="passwordCallbackRef" value-ref="serverCallback" /> <spring:entry key="signaturePropFile" value="wssecurity.properties" /> </spring:map> </spring:constructor-arg> </spring:bean> </cxf:inInterceptors> </cxf:inbound-endpoint> </inbound> </service> But how is it possible to create a policy of what algorithms that are allowed, and what parts of the message that should be signed?

Read the article

Log 2 N generic comparison tree

- by Morano88

Hey! I'm working on an algorithm for Redundant Binary Representation (RBR) where every two bits represent a digit. I designed the comparator that takes 4 bits and gives out 2 bits. I want to make the comparison in log 2 n so If I have X and Y .. I compare every 2 bits of X with every 2 bits of Y. This is smooth if the number of bits of X or Y equals n where (n = 2^X) i.e n = 2,4,8,16,32,... etc. Like this : However, If my input let us say is 6 or 10 .. then it becomes not smooth and I have to take into account some odd situations like this : I have a shallow experience in algorithms .. is there a generic way to do it .. so at the end I get only 2 bits no matter what the input is ? I just need hints or pseudo-code. If my question is not appropriate here .. so feel free to flag it or tell me to remove it. I'm using VHDL by the way!

Read the article

Generating easy-to-remember random identifiers

- by Carl Seleborg

Hi all, As all developers do, we constantly deal with some kind of identifiers as part of our daily work. Most of the time, it's about bugs or support tickets. Our software, upon detecting a bug, creates a package that has a name formatted from a timestamp and a version number, which is a cheap way of creating reasonably unique identifiers to avoid mixing packages up. Example: "Bug Report 20101214 174856 6.4b2". My brain just isn't that good at remembering numbers. What I would love to have is a simple way of generating alpha-numeric identifiers that are easy to remember. Examples would be "azil3", "ulmops", "fel2way", etc. I just made these up, but they are much easier to recognize when you see many of them at once. I know of algorithms that perform trigram analysis on text (say you feed them a whole book in German) and that can generate strings that look and feel like German words. This requires lots of data, though, and makes it slightly less suitable for embedding in an application just for this purpose. Do you know of anything else? Thanks! Carl

Read the article

Divide and conquer of large objects for GC performance

- by Aperion

At my work we're discussing different approaches to cleaning up a large amount of managed ~50-100MB memory.There are two approaches on the table (read: two senior devs can't agree) and not having the experience the rest of the team is unsure of what approach is more desirable, performance or maintainability. The data being collected is many small items, ~30000 which in turn contains other items, all objects are managed. There is a lot of references between these objects including event handlers but not to outside objects. We'll call this large group of objects and references as a single entity called a blob. Approach #1: Make sure all references to objects in the blob are severed and let the GC handle the blob and all the connections. Approach #2: Implement IDisposable on these objects then call dispose on these objects and set references to Nothing and remove handlers. The theory behind the second approach is since the large longer lived objects take longer to cleanup in the GC. So, by cutting the large objects into smaller bite size morsels the garbage collector will processes them faster, thus a performance gain. So I think the basic question is this: Does breaking apart large groups of interconnected objects optimize data for garbage collection or is better to keep them together and rely on the garbage collection algorithms to processes the data for you? I feel this is a case of pre-optimization, but I do not know enough of the GC to know what does help or hinder it.

Read the article

Select all points in a matrix within 30m of another point

- by pinnacler

So if you look at my other posts, it's no surprise I'm building a robot that can collect data in a forest, and stick it on a map. We have algorithms that can detect tree centers and trunk diameters and can stick them on a cartesian XY plane. We're planning to use certain 'key' trees as natural landmarks for localizing the robot, using triangulation and trilateration among other methods, but programming this and keeping data straight and efficient is getting difficult using just Matlab. Is there a technique for sub-setting an array or matrix of points? Say I have 1000 trees stored over 1km (1000m), is there a way to say, select only points within 30m radius of my current location and work only with those? I would just use a GIS, but I'm doing this in Matlab and I'm unaware of any GIS plugins for Matlab. I forgot to mention, this code is going online, meaning it's going on a robot for real-time execution. I don't know if, as the map grows to several miles, using a different data structure will help or if calculating every distance to a random point is what a spatial database is going to do anyway. I'm thinking of mirroring two arrays, one sorted by X and the other by Y. Then bubble sorting to determine the 30m range in that. I do the same for both arrays, X and Y, and then have a third cross link table that will select the individual values. But I don't know, what that's called, how to program that and I'm sure someone already has so I don't want to reinvent the wheel. Cartesian Plane GIS

Read the article

Practical rules for premature optimization

- by DougW

It seems that the phrase "Premature Optimization" is the buzz-word of the day. For some reason, iphone programmers in particular seem to think of avoiding premature optimization as a pro-active goal, rather than the natural result of simply avoiding distraction. The problem is, the term is beginning to be applied more and more to cases that are completely inappropriate. For example, I've seen a growing number of people say not to worry about the complexity of an algorithm, because that's premature optimization (eg http://stackoverflow.com/questions/2190275/help-sorting-an-nsarray-across-two-properties-with-nssortdescriptor/2191720#2191720). Frankly, I think this is just laziness, and appalling to disciplined computer science. But it has occurred to me that maybe considering the complexity and performance of algorithms is going the way of assembly loop unrolling, and other optimization techniques that are now considered unnecessary. What do you think? Are we at the point now where deciding between an O(n^n) and O(n!) complexity algorithm is irrelevant? What about O(n) vs O(n*n)? What do you consider "premature optimization"? What practical rules do you use to consciously or unconsciously avoid it? This is a bit vague, but I'm curious to hear other peoples' opinions on the topic.

Read the article

Webcam video stream processing.

- by vikramtheone

Hi Guys, I'm working with an image processing project, my final goal is to detect features on a real time video and finally track those features. I will be working with an Embedded Processor Platform called Freescale's i.MX515, it is a 32-bit media processor running on Ubuntu 9.04. Right now I'm working on the algorithms to locate the features, so, I'm using still images. When I'm satisfied with the results I will have to start using a video stream and I don't want to make use of a video file as a source stream, because then I will have to worry about video decoders then. Instead I would like to plug in a USB Wecam to the embedded platform (It has USB ports on it), directly take the frames as they are captured and send it to my application. I will take care to buy a webcam which will be supported in Linux (Device driver). But my question is will I be able to capture the incoming video stream from the webcam and send it to my application? Will I be able to configure the webcam and DMA to write the incoming frames in a particular memory location whose pointer I can simply pass to my application? (Confused!!!) I hope I could convey my doubts, can anyone guide me with what steps that I have to take to achieve all of these easily? Do you foresee any impossibility here? Help!!! Regards Vikram

Read the article

Simplifying Testing through design considerations while utilizing dependency injection

- by Adam Driscoll

We are a few months into a green-field project to rework the Logic and Business layers of our product. By utilizing MEF (dependency injection) we have achieved high levels of code coverage and I believe that we have a pretty solid product. As we have been working through some of the more complex logic I have found it increasingly difficult to unit test. We are utilizing the CompositionContainer to query for types required by these complex algorithms. My unit tests are sometimes difficult to follow due to the lengthy mock object setup process that must take place, just right, to allow for certain circumstances to be verified. My unit tests often take me longer to write than the code that I'm trying to test. I realize this is not only an issue with dependency injection but with design as a whole. Is poor method design or lack of composition to blame for my overly complex tests? I've tried base classing tests, creating commonly used mock objects and ensuring that I utilize the container as much as possible to ease this issue but my tests always end up quite complex and hard to debug. What are some tips that you've seen to keep such tests concise, readable, and effective?

Read the article

Ideas Related to Subset Sum with 2,3 and more integers

- by rolandbishop

I've been struggling with this problem just like everyone else and I'm quite sure there has been more than enough posts to explain this problem. However in terms of understanding it fully, I wanted to share my thoughts and get more efficient solutions from all the great people in here related to Subset Sum problem. I've searched it over the Internet and there is actually a lot sources but I'm really willing to re-implement an algorithm or finding my own in order to understand fully. The key thing I'm struggling with is the efficiency considering the set size will be large. (I do not have a limit, just conceptually large). The two phases I'm trying to implement ideas on is finding two numbers that are equal to given integer T, finding three numbers and eventually K numbers. Some ideas I've though; For the two integer part I'm thing basically sorting the array O(nlogn) and for each element in the array searching for its negative value. (i.e if the array element is 3 searching for -3). Maybe a hash table inclusion could be better, providing a O(1) indexing the element? For the three or more integers I've found an amazing blog post;http://www.skorks.com/2011/02/algorithms-a-dropbox-challenge-and-dynamic-programming/. However even the author itself states that it is not applicable for large numbers. So I was for 2 and 3 and more integers what ideas could be applied for the subset problem. I'm struggling with setting up a dynamic programming method that will be efficient for the large inputs as well.

Read the article

What image processing Library should I use

- by Swippen

I have been reading What is the best image manipulation library? And tried a few libraries and are now looking for inputs on what is the best for our need. I will start by describing our current setting and problems. We have a system that needs to resize and crop a large amount of images from big original images. We handle 50 000+ images every day on 2 powerfull servers. Today we use ImageGlue from WebSupergoo but we don't like it at all, it is slow and hangs the service now and then (Its in another unanswered stack overflow question). We have a threaded windows service that uses Microsoft ThreadPool to resize as much as possible on the 8 core machines. I have tried AForge and it went very well it was loads faster and never crashed or anything. But I had problems with quality on a few images. This due to what algorithms I used ofc so can be tweaked. But want to widen our eyes to see if thats the right way to go. so: It needs to be c# .net and run in a windows service. (Since we wont change the rest of the service only image handling) It needs to handle threaded environment well. We have a great need of it being fast since today its too slow. But we also want good quality and small filesize since the images are later displayed on webpage with loads of visitors and needs good quality. So we have a lot of demands on ability to get god quality at a fast pace, and also secondary keep filesizes lowered even if that can be adjusted with compression a bit. Any comments or suggestions on what library to use?

Read the article

Web application architecture, and application servers?

- by seanieb

Hi, I'm building a web application, and I need to use an architecture that allows me to run it on two servers. The application scrapes information from other sites periodically, and on input from the end user. To do this I'm using Php+curl to scrape the information, Php or python to parse it and store the results in a MySQLDB. Then I will use Python to run some algorithms on the data, this will happen both periodically and on input from the end user. I'm going to cache some of the results in the MySQL DB and sometimes if it is specific to the user, skip storing the data and serve it to the user. I'm think of using Php for the website front end on a separate web server, running the Php spider, MySQL DB and python on another server. As you can see I'm fairly clueless. I'm familiar with using Php, MySQL and the basics of Python, but bringing this all together using something more complex than a cron job is new to me. How do go about implementing this? What frame work(s) should I use? Is MVC a good architecture for this? (I'm new to MVC, architectures etc.) Is Cakephp a good solution? If so will I be able to control and monitor the Python code using it?

Read the article

Machine leaning algorithm for data classification.

- by twk

Hi all, I'm looking for some guidance about which techniques/algorithms I should research to solve the following problem. I've currently got an algorithm that clusters similar-sounding mp3s using acoustic fingerprinting. In each cluster, I have all the different metadata (song/artist/album) for each file. For that cluster, I'd like to pick the "best" song/artist/album metadata that matches an existing row in my database, or if there is no best match, decide to insert a new row. For a cluster, there is generally some correct metadata, but individual files have many types of problems: Artist/songs are completely misnamed, or just slightly mispelled the artist/song/album is missing, but the rest of the information is there the song is actually a live recording, but only some of the files in the cluster are labeled as such. there may be very little metadata, in some cases just the file name, which might be artist - song.mp3, or artist - album - song.mp3, or another variation A simple voting algorithm works fairly well, but I'd like to have something I can train on a large set of data that might pick up more nuances than what I've got right now. Any links to papers or similar projects would be greatly appreciated. Thanks!

Read the article

Using game of life or other virtual environment for artificial (intelligence) life simulation? [clos

- by Berlin Brown

One of my interests in AI focuses not so much on data but more on biologic computing. This includes neural networks, mapping the brain, cellular-automata, virtual life and environments. Described below is an exciting project that includes develop a virtual environment for bots to evolve in. "Polyworld is a cross-platform (Linux, Mac OS X) program written by Larry Yaeger to evolve Artificial Intelligence through natural selection and evolutionary algorithms." http://en.wikipedia.org/wiki/Polyworld " Polyworld is a promising project for studying virtual life but it still is far from creating an "intelligent autonomous" agent. Here is my question, in theory, what parameters would you use create an AI environment? Possibly a brain environment? Possibly multiple self contained life organisms that have their own "brain" or life structures. I would like a create a spin on the game of life simulation. What if you have a 64x64 game of life grid. But instead of one grid, you might have N number of grids. The N number of grids are your "life force" If all of the game of life entities die in a particular grid then that entire grid dies. A group of "grids" makes up a life form. I don't have an immediate goal. First, I want to simulate an environment and visualize what is going on in the environment with OpenGL and see if there are any interesting properties to the environment. I then want to add "scarce resources" and see if the AI environment can manage resources adequately.

Read the article

Need some ignition in Embedded Systems

- by Rahul

I'm very much interested in building applications for Embedded Devices. I'm in my 3rd year Electrical Engineering and I'm passionate about coding, algorithms, Linux OS, etc. And also by Googling I found out that Linux OS is one of the best OSes for Embedded devices(may be/may not be). I want to work for companies which work on mobile applications. I'm a newbie/naive to this domain & my skills include C/C++ & MySQL. I need help to get started in the domain of Embedded Systems; like how/where to start off, Hardware prerequisites, necessary programming skills, also what kind of Embedded Applications etc. I've heard of ARM, firmware, PIC Micorcontrollers; but I don't know anything & just need proper introduction about them. Thanx. P.S: I'm currently reading Bjarne Struotsup's lecture in C++ at Texas A&M University, and one chapter in it describes about Embedded Systems Programming.

Read the article

The Immerman-Szelepcsenyi Theorem

- by Daniel Lorch

In the Immerman-Szelepcsenyi Theorem, two algorithms are specified that use non-determinisim. There is a rather lengthy algorithm using "inductive counting", which determines the number of reachable configurations for a given non-deterministic turing machine. The algorithm looks like this: Let m_{i+1}=0 For all configurations C Let b=0, r=0 For all configurations D Guess a path from I to D in at most i steps If found Let r=r+1 If D=C or D goes to C in 1 step Let b=1 If r<m_i halt and reject Let m_{i+1}=m_{i+1}+b I is the starting configuration. m_i is the number of configurations reachable from the starting configuration in i steps. This algorithm only calculates the "next step", i.e. m_i+1 from m_i. This seems pretty reasonable, but since we have nondeterminisim, why don't we just write: Let m_i = 0 For all configurations C Guess a path from I to C in at most i steps If found m_i = m_i + 1 What is wrong with this algorithm? I am using nondeterminism to guess a path from I to C, and I verify reachability I am iterating through the list of ALL configurations, so I am sure to not miss any configuration I respect space bounds I can generate a certificate (the list of reachable configs) I believe I have a misunderstanding of the "power" of non-determinisim, but I can't figure out where to look next. I am stuck on this for quite a while and I would really appreciate any help.

Read the article

finding ALL cycles in a huge sparse matrix

- by Andy

Hi there, First of all I'm quite a Java beginner, so I'm not sure if this is even possible! Basically I have a huge (3+million) data source of relational data (i.e. A is friends with B+C+D, B is friends with D+G+Z (but not A - i.e. unmutual) etc.) and I want to find every cycle within this (not necessarily connected) directed graph. I've found this thread (http://stackoverflow.com/questions/546655/finding-all-cycles-in-graph/549402#549402) which has pointed me to Donald Johnson's (elementary) cycle-finding algorithm which, superficially at least, looks like it'll do what I'm after (I'm going to try when I'm back at work on Tuesday - thought it wouldn't hurt to ask in the meanwhile!). I had a quick scan through the code of the Java implementation of Johnson's algorithm (in that thread) and it looks like a matrix of relations is the first step, so I guess my questions are: a) Is Java capable of handling a 3+million*3+million matrix? (was planning on representing A-friends-with-B by a binary sparse matrix) b) Do I need to find every connected subgraph as my first problem, or will cycle-finding algorithms handle disjoint data? c) Is this actually an appropriate solution for the problem? My understanding of "elementary" cycles is that in the graph below, rather than picking out A-B-C-D-E-F it'll pick out A-B-F, B-C-D etc. but that's not the end of the world given the task. E / \ D---F / \ / \ C---B---A d) If necessary, I can simplify the problem by enforcing mutuality in relations - i.e. A-friends-with-B <== B-friends-with-A, and if really necessary I can maybe cut down the data size, but realistically it is always going to be around the 1mil mark. z) Is this a P or NP task?! Am I biting off more than I can chew? Thanks all, any help appreciated! Andy

Read the article

Have any software engineers gotten math degrees later in their careers?

- by vin

I have a bachelors in computer science and worked the last 12 years as a software engineer. I'm bored with doing general development work, so I want to specialize. I'm thinking about getting a master's degree in math so I can build math models and write algorithms to implement them. I'm unsure what type of work I'd do (financial, gaming, graphics, science, research, etc) but I'm open minded. I would need to refresh my undergrad math skills (which are old and faded), but I loved algebra and calculus. I've been working with couple statisticians so I've been finding myself more interested in statistics. Since I'm a parent supporting a household, I would have to continue working while studying. Have any software engineers taken this route? (Specifically, going from BS in comp sci to MS in math.) If so, what advice do you have for coursework, financing, and getting a job that combines programming with advanced math? How abundant are these kinds of jobs? I'm not sure where one starts. Also, how do you hop from a BS to an MS in a different subject?

Read the article

Automated Legal Processing

- by Chris S

Will it ever be possible to make legal systems quantifiable enough to process with computer algorithms? What technologies would have to be in place before this is possible? Are there any existing technologies that are already trying to accomplish this? Out of curiosity, I downloaded the text for laws in my local municipality, and tried applying some simple NLP tricks to extract rules from sentences. I had mixed results. Some sentences were very explicit (e.g. "Cars may not be left in the park overnight"), but other sentences seemed hopelessly vague (e.g. "The council's purpose is to ensure the well-being of the community"). I apologize if this is too open-ended a topic, but I've often wondered what society would look like if legal systems were based on less ambiguous language. Lawyers, and the legal process in general, are so expensive because they have to manually process a complex set of rules codified in ambiguous legal texts. If this system could be represented in software, this huge expense could potentially be eliminated, making the legal system more accessible for everyone.

Read the article

How to implement a category hierarchy using collections

- by Luke101

Hello, I have about 200 categories that are nested. I am currently reading the documention on the C5 generics library. I am not sure if the C5 library is overkill or not. I am looking at converting all my custom algorithms to the C5 implemention. This is what I need. If a certain category is chosen i need to find its parents, siblings, direct children, and all children. This is the way I have it set up. To find the: Parents: I start from the current location then loop through the list and find the current parent. When I find the parent I loop through the whole list again to find the next parent and so on. Siblings: I loop through the whole list and find all the nodes that have the same parent as the choosen node. direct children: I loop through the whole list and find all nodes that is a parent of the choosen node. All Children: This one took me a while to figure out. But I used recursion to find all children of the choosen node. Is there a better way to implement something like this?

Read the article

Why i am getting NullPointerException for this btree method??

- by user306540

hi, i am writing code for btree algorithms. i am getting NullPointerException . why???? please somebody help me...! public void insertNonFull(BPlusNode root,BPlusNode parent,String key) { int i=0; BPlusNode child=new BPlusNode(); BPlusNode node=parent; while(true) { i=node.numKeys-1; if(node.leaf) { while(i>=0 && key.compareTo(node.keys[i])<0) { node.keys[i+1]=node.keys[i]; i--; } node.keys[i+1]=key; node.numKeys=node.numKeys+1; } else { while(i>=0 && key.compareTo(node.keys[i])<0) { i--; } } i++; child=node.pointers[i]; if(child!=null && child.numKeys==7) { splitChild(root,node,i,child); if(key.compareTo(node.keys[i])>0) { i++; } } node=node.pointers[i]; } }

Read the article

incremental way of counting quantiles for large set of data

- by Gacek

I need to count the quantiles for a large set of data. Let's assume we can get the data only through some portions (i.e. one row of a large matrix). To count the Q3 quantile one need to get all the portions of the data and store it somewhere, then sort it and count the quantile: List<double> allData = new List<double>(); foreach(var row in matrix) // this is only example. In fact the portions of data are not rows of some matrix { allData.AddRange(row); } allData.Sort(); double p = 0.75*allData.Count; int idQ3 = (int)Math.Ceiling(p) - 1; double Q3 = allData[idQ3]; Now, I would like to find a way of counting this without storing the data in some separate variable. The best solution would be to count some parameters od mid-results for first row and then adjust it step by step for next rows. Note: These datasets are really big (ca 5000 elements in each row) The Q3 can be estimated, it doesn't have to be an exact value. I call the portions of data "rows", but they can have different leghts! Usually it varies not so much (+/- few hundred samples) but it varies! This question is similar to this one: http://stackoverflow.com/questions/1058813/on-line-iterator-algorithms-for-estimating-statistical-median-mode-skewness But I need to count quantiles. ALso there are few articles in this topic, i.e.: http://web.cs.wpi.edu/~hofri/medsel.pdf http://portal.acm.org/citation.cfm?id=347195&dl But before I would try to implement these, I wanted to ask you if there are maybe any other, qucker ways of counting the 0.25/0.75 quantiles?

Search Results

Search found 1933 results on 78 pages for 'genetic algorithms'.

Page 69/78 | < Previous Page | 65 66 67 68 69 70 71 72 73 74 75 76 | Next Page >

- by nico

- by Andreas Grech

- by anrizal

- by Ygam

- by Brakara

- by Morano88

- by Carl Seleborg

- by Aperion

- by pinnacler

- by DougW

- by vikramtheone

- by Adam Driscoll

- by rolandbishop

- by Swippen

- by seanieb

- by twk

- by Berlin Brown

- by Rahul

- by Daniel Lorch

- by Andy

- by vin

- by Chris S

- by Luke101

- by user306540

- by Gacek

< Previous Page | 65 66 67 68 69 70 71 72 73 74 75 76 | Next Page >