Search Results

Search found 5298 results on 212 pages for 'marching cubes algorithm'.

Page 82/212 | < Previous Page | 78 79 80 81 82 83 84 85 86 87 88 89 | Next Page >

Is there a faster way to parse through a large file with regex quickly?

- by Ray Eatmon

Problem: Very very, large file I need to parse line by line to get 3 values from each line. Everything works but it takes a long time to parse through the whole file. Is it possible to do this within seconds? Typical time its taking is between 1 minute and 2 minutes. Example file size is 148,208KB I am using regex to parse through every line: Here is my c# code: private static void ReadTheLines(int max, Responder rp, string inputFile) { List<int> rate = new List<int>(); double counter = 1; try { using (var sr = new StreamReader(inputFile, Encoding.UTF8, true, 1024)) { string line; Console.WriteLine("Reading...."); while ((line = sr.ReadLine()) != null) { if (counter <= max) { counter++; rate = rp.GetRateLine(line); } else if(max == 0) { counter++; rate = rp.GetRateLine(line); } } rp.GetRate(rate); Console.ReadLine(); } } catch (Exception e) { Console.WriteLine("The file could not be read:"); Console.WriteLine(e.Message); } } Here is my regex: public List<int> GetRateLine(string justALine) { const string reg = @"^\d{1,}.+\[(.*)\s[\-]\d{1,}].+GET.*HTTP.*\d{3}[\s](\d{1,})[\s](\d{1,})$"; Match match = Regex.Match(justALine, reg, RegexOptions.IgnoreCase); // Here we check the Match instance. if (match.Success) { // Finally, we get the Group value and display it. string theRate = match.Groups[3].Value; Ratestorage.Add(Convert.ToInt32(theRate)); } else { Ratestorage.Add(0); } return Ratestorage; } Here is an example line to parse, usually around 200,000 lines: 10.10.10.10 - - [27/Nov/2002:16:46:20 -0500] "GET /solr/ HTTP/1.1" 200 4926 789

Read the article
Naive Bayesian classification (spam filtering) - Doubt in one calculation? Which one is right? Plz c

- by Microkernel

Hi guys, I am implementing Naive Bayesian classifier for spam filtering. I have doubt on some calculation. Please clarify me what to do. Here is my question. In this method, you have to calculate P(S|W) - Probability that Message is spam given word W occurs in it. P(W|S) - Probability that word W occurs in a spam message. P(W|H) - Probability that word W occurs in a Ham message. So to calculate P(W|S), should I do (1) (Number of times W occuring in spam)/(total number of times W occurs in all the messages) OR (2) (Number of times word W occurs in Spam)/(Total number of words in the spam message) So, to calculate P(W|S), should I do (1) or (2)? (I thought it to be (2), but I am not sure, so plz clarify me) I am refering http://en.wikipedia.org/wiki/Bayesian_spam_filtering for the info by the way. I got to complete the implementation by this weekend :( Thanks and regards, MicroKernel :) @sth: Hmm... Shouldn't repeated occurrence of word 'W' increase a message's spam score? In the your approach it wouldn't, right?. Lets take a scenario and discuss... Lets say, we have 100 training messages, out of which 50 are spam and 50 are Ham. and say word_count of each message = 100. And lets say, in spam messages word W occurs 5 times in each message and word W occurs 1 time in Ham message. So total number of times W occuring in all the spam message = 5*50 = 250 times. And total number of times W occuring in all Ham messages = 1*50 = 50 times. Total occurance of W in all of the training messages = (250+50) = 300 times. So, in this scenario, how do u calculate P(W|S) and P(W|H) ? Naturally we should expect, P(W|S) P(W|H)??? right. Please share your thought...

Read the article
How to generate a number in arbitrary range using random()={0..1} preserving uniformness and density?

- by psihodelia

Generate a random number in range [x..y] where x and y are any arbitrary floating point numbers. Use function random(), which returns a random floating point number in range [0..1] from P uniformly distributed numbers (call it "density"). Uniform distribution must be preserved and P must be scaled as well. I think, there is no easy solution for such problem. To simplify it a bit, I ask you how to generate a number in interval [-0.5 .. 0.5], then in [0 .. 2], then in [-2 .. 0], preserving uniformness and density? Thus, for [0 .. 2] it must generate a random number from P*2 uniformly distributed numbers. The obvious simple solution random() * (x - y) + y will generate not all possible numbers because of the lower density for all abs(x-y)>1.0 cases. Many possible values will be missed. Remember, that random() returns only a number from P possible numbers. Then, if you multiply such number by Q, it will give you only one of P possible values, scaled by Q, but you have to scale density P by Q as well.

Read the article
Genetic programming in c++, library suggestions?

- by shuttle87

I'm looking to add some genetic algorithms to an Operations research project I have been involved in. Currently we have a program that aids in optimizing some scheduling and we want to add in some heuristics in the form of genetic algorithms. Are there any good libraries for generic genetic programming/algorithms in c++? Or would you recommend I just code my own? I should add that while I am not new to c++ I am fairly new to doing this sort of mathematical optimization work in c++ as the group I worked with previously had tended to use a proprietary optimization package. We have a fitness function that is fairly computationally intensive to evaluate and we have a cluster to run this on so parallelized code is highly desirable. So is c++ a good language for this? If not please recommend some other ones as I am willing to learn another language if it makes life easier. thanks!

Read the article
How can I test if a point lies within a 3d shape with its surface defined by a point cloud?

- by Ben

Hi I have a collection of points which describe the surface of a shape that should be roughly spherical, and I need a method with which to determine if any other given point lies within this shape. I've previously been approximating the shape as an exact sphere, but this has proven too inaccurate and I need a more accurate method. Simplicity and speed is favourable over complete accuracy, a good approximation will suffice. I've come across techniques for converting a point cloud to a 3d mesh, but most things I have found have been very complicated, and I am looking for something as simple as possible. Any ideas? Many thanks, Ben.

Read the article
What is the difference between tree depth and height?

- by Gabriel Šcerbák

This is a simple question from algorithms theory. The difference between them is that in one case you count number of nodes and in other number of edges on the shortest path between root and concrete node. Which is which?

Read the article
Why does Java's hashCode() in String use 31 as a multiplier?

- by jacobko

In Java, the hash code for a String object is computed as s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. Why is 31 used as a multiplier? I understand that the multiplier should be a relatively large prime number. So why not 29, or 37, or even 97?

Read the article
Clamping a vector to a minimum and maximum?

- by user146780

I came accross this: t = Clamp(t/d, 0, 1) but I'm not sure how to perform this operation on a vector. What are the steps to clamp a vector if one was writing their own vector implementation? Thanks clamp clamping a vector to a minimum and a maximum ex: pc = # the point you are coloring now p0 = # start point p1 = # end point v = p1 - p0 d = Length(v) v = Normalize(v) # or Scale(v, 1/d) v0 = pc - p0 t = Dot(v0, v) t = Clamp(t/d, 0, 1) color = (start_color * t) + (end_color * (1 - t))

Read the article
Travelling Salesman Problem Constraint Representation

- by alex25

Hey! I read a couple of articles and sample code about how to solve TSP with Genetic Algorithms and Ant Colony Optimization etc. But everything I found didn't include time (window) constraints, eg. "I have to be at customer x before 12am)" and assumed symmetry. Can somebody point me into the direction of some sample code or articles that explain how I can add constraints to TSP and how I can represent those in code. Thanks!

Read the article
How to arrange array in decreasing order of frequency of each number?

- by Rajendra

Input : {5, 13, 6, 5, 13, 7, 8, 6, 5} Output : {5, 5, 5, 13, 13, 6, 6, 7, 8} Question is to arrange the numbers in the array in decreasing order of their frequency preserving the order of their occurrence. If their is a tie, for example, here 13 and 6 then the number occurring first in input array would come first in output array.

Read the article
question about api functions

- by davit-datuashvili

i have question we have API functions in java can user create it's own function and add to his java IDE? for example i am using netbeans can i create my own function add to netbean IDE?let say create binary function or something else thanks

Read the article
Lexographical sorting problem

- by Shawn Mclean

I'm doing a problem that says concatenate the words to generate the lexicographically lowest possible string. from a competition. Take for example this string: jibw ji jp bw jibw The actual output turns out to be: bw jibw jibw ji jp When I do sorting on this, I get: bw ji jibw jibw jp. Does this mean that this is not sorting? If it is sorting, does lexicographic sorting take into consideration pushing the shorter strings to the back or something? I've been doing some reading on lexigographical order and I dont see any point or scenarios on which this is used, do you have any?

Read the article
Data Structure for a particular problem??

- by AGeek

Hi, Which data structure can perform insertion, deletion and searching operation in O(1) time in the worst case. We may assume the set of elements are integers drawn from a finite set 1,2,...,n, and initialization can take O(n) time. I can only think of implementing a hash table. Implementing it with Trees will not give O(1) time complexity for any of the operation. Or is it possible?? Kindly share your views on this, or any other data structure apart from these.. Thanks..

Read the article
How to output multicolumn html without "widows"?

- by user314850

I need to output to HTML a list of categorized links in exactly three columns of text. They must be displayed similar to columns in a newspaper or magazine. So, for example, if there are 20 lines total the first and second columns would contain 7 lines and the last column would contain 6. The list must be dynamic; it will be regularly changed. The tricky part is that the links are categorized with a title and this title cannot be a "widow". If you have a page layout background you'll know that this means the titles cannot be displayed at the bottom of the column -- they must have at least one link underneath them, otherwise they should bump to the next column (I know, technically it should be two lines if I were actually doing page layout, but in this case one is acceptable). I'm having a difficult time figuring out how to get this done. Here's an example of what I mean: Shopping Link 3 Link1 Link 1 Link 4 Link2 Link 2 Link 3 Link 3 Cars Link 1 Music Games Link 2 Link 1 Link 1 Link 2 News As you can see, the "News" title is at the bottom of the middle column, and so is a "widow". This is unacceptable. I could bump it to the next column, but that would create an unnecessarily large amount of white space at the bottom of the second column. What needs to happen instead is that the entire list needs to be re-balanced. I'm wondering if anyone has any tips for how to accomplish this, or perhaps source code or a plug in. Python is preferable, but any language is fine. I'm just trying to get the general concept down.

Read the article
Converting to a column oriented array in Java

- by halfwarp

Although I have Java in the title, this could be for any OO language. I'd like to know a few new ideas to improve the performance of something I'm trying to do. I have a method that is constantly receiving an Object[] array. I need to split the Objects in this array through multiple arrays (List or something), so that I have an independent list for each column of all arrays the method receives. Example: List<List<Object>> column-oriented = new ArrayList<ArrayList<Object>>(); public void newObject(Object[] obj) { for(int i = 0; i < obj.length; i++) { column-oriented.get(i).add(obj[i]); } } Note: For simplicity I've omitted the initialization of objects and stuff. The code I've shown above is slow of course. I've already tried a few other things, but would like to hear some new ideas. How would you do this knowing it's very performance sensitive?

Read the article
k-combinations of a set of integers in ascending size order

- by Adamski

Programming challenge: Given a set of integers [1, 2, 3, 4, 5] I would like to generate all possible k-combinations in ascending size order in Java; e.g. [1], [2], [3], [4], [5], [1, 2], [1, 3] ... [1, 2, 3, 4, 5] It is fairly easy to produce a recursive solution that generates all combinations and then sort them afterwards but I imagine there's a more efficient way that removes the need for the additional sort.

Read the article
Pointer-based binary heap implementation

- by Derek Chiang

Is it even possible to implement a binary heap using pointers rather than an array? I have searched around the internet (including SO) and no answer can be found. The main problem here is that, how do you keep track of the last pointer? When you insert X into the heap, you place X at the last pointer and then bubble it up. Now, where does the last pointer point to? And also, what happens when you want to remove the root? You exchange the root with the last element, and then bubble the new root down. Now, how do you know what's the new "last element" that you need when you remove root again?

Read the article
algorithm q: Fuzzy matching of structured data

- by user86432

I have a fairly small corpus of structured records sitting in a database. Given a tiny fraction of the information contained in a single record, submitted via a web form (so structured in the same way as the table schema), (let us call it the test record) I need to quickly draw up a list of the records that are the most likely matches for the test record, as well as provide a confidence estimate of how closely the search terms match a record. The primary purpose of this search is to discover whether someone is attempting to input a record that is duplicate to one in the corpus. There is a reasonable chance that the test record will be a dupe, and a reasonable chance the test record will not be a dupe. The records are about 12000 bytes wide and the total count of records is about 150,000. There are 110 columns in the table schema and 95% of searches will be on the top 5% most commonly searched columns. The data is stuff like names, addresses, telephone numbers, and other industry specific numbers. In both the corpus and the test record it is entered by hand and is semistructured within an individual field. You might at first blush say "weight the columns by hand and match word tokens within them", but it's not so easy. I thought so too: if I get a telephone number I thought that would indicate a perfect match. The problem is that there isn't a single field in the form whose token frequency does not vary by orders of magnitude. A telephone number might appear 100 times in the corpus or 1 time in the corpus. The same goes for any other field. This makes weighting at the field level impractical. I need a more fine-grained approach to get decent matching. My initial plan was to create a hash of hashes, top level being the fieldname. Then I would select all of the information from the corpus for a given field, attempt to clean up the data contained in it, and tokenize the sanitized data, hashing the tokens at the second level, with the tokens as keys and frequency as value. I would use the frequency count as a weight: the higher the frequency of a token in the reference corpus, the less weight I attach to that token if it is found in the test record. My first question is for the statisticians in the room: how would I use the frequency as a weight? Is there a precise mathematical relationship between n, the number of records, f(t), the frequency with which a token t appeared in the corpus, the probability o that a record is an original and not a duplicate, and the probability p that the test record is really a record x given the test and x contain the same t in the same field? How about the relationship for multiple token matches across multiple fields? Since I sincerely doubt that there is, is there anything that gets me close but is better than a completely arbitrary hack full of magic factors? Barring that, has anyone got a way to do this? I'm especially keen on other suggestions that do not involve maintaining another table in the database, such as a token frequency lookup table :). This is my first post on StackOverflow, thanks in advance for any replies you may see fit to give.

Read the article
How to reverse a number as an integer and not as a string?

- by Pritam

I came across a question "How can one reverse a number as an integer and not as a string?" Could anyone please help me to find out the answer.

Read the article
Calculating holidays

- by Ralph Shillington

A number of holidays move around from year to year. For example, in Canada Victoria day (aka the May two-four weekend) is the Monday before May 25th, or Thanksgiving is the 2nd Monday of October (in Canada). I've been using variations on this Linq query to get the date of a holiday for a given year: var year = 2011; var month = 10; var dow = DayOfWeek.Monday; var instance = 2; var day = (from d in Enumerable.Range(1,DateTime.DaysInMonth(year,month)) let sample = new DateTime(year,month,d) where sample.DayOfWeek == dow select sample).Skip(instance-1).Take(1); While this works, and is easy enough to understand, I can imagine there is a more elegant way of making this calculation versus this brute force approach. Of course this doesn't touch on holidays such as Easter and the many other lunar based dates.

Read the article
How can I group an array of rectangles into "Islands" of connected regions?

- by Eric

The problem I have an array of java.awt.Rectangles. For those who are not familiar with this class, the important piece of information is that they provide an .intersects(Rectangle b) function. I would like to write a function that takes this array of Rectangles, and breaks it up into groups of connected rectangles. Lets say for example, that these are my rectangles (constructor takes the arguments x, y, width,height): Rectangle[] rects = new Rectangle[] { new Rectangle(0, 0, 4, 2), //A new Rectangle(1, 1, 2, 4), //B new Rectangle(0, 4, 8, 2), //C new Rectangle(6, 0, 2, 2) //D } A quick drawing shows that A intersects B and B intersects C. D intersects nothing. A tediously drawn piece of ascii art does the job too: +-------+ +---+ ¦A+---+ ¦ ¦ D ¦ +-+---+-+ +---+ ¦ B ¦ +-+---+---------+ ¦ +---+ C ¦ +---------------+ Therefore, the output of my function should be: new Rectangle[][]{ new Rectangle[] {A,B,C}, new Rectangle[] {D} } The failed code This was my attempt at solving the problem: public List<Rectangle> getIntersections(ArrayList<Rectangle> list, Rectangle r) { List<Rectangle> intersections = new ArrayList<Rectangle>(); for(Rectangle rect : list) { if(r.intersects(rect)) { list.remove(rect); intersections.add(rect); intersections.addAll(getIntersections(list, rect)); } } return intersections; } public List<List<Rectangle>> mergeIntersectingRects(Rectangle... rectArray) { List<Rectangle> allRects = new ArrayList<Rectangle>(rectArray); List<List<Rectangle>> groups = new ArrayList<ArrayList<Rectangle>>(); for(Rectangle rect : allRects) { allRects.remove(rect); ArrayList<Rectangle> group = getIntersections(allRects, rect); group.add(rect); groups.add(group); } return groups; } Unfortunately, there seems to be an infinite recursion loop going on here. My uneducated guess would be that java does not like me doing this: for(Rectangle rect : allRects) { allRects.remove(rect); //... } Can anyone shed some light on the issue?

Read the article
Dot Game and Dynamic Programming

- by Albert Diego

I'm trying to solve a variant of the dot game with dynamic programming. The regular dot game is played with a line of dots. Each player takes either one or two dots at their respective end of the line and the person who is left with no dots to take wins. In this version of the game, each dot has a different value. Each player takes alternate turns and takes either dot at either end of the line. I want to come up with a way to use dynamic programming to find the max amount that the first player is guaranteed to win. I'm having problems grasping my head around this and trying to write a recurrence for the solution. Any help is appreciated, thanks!

Read the article
How do I find all paths through a set of given nodes in a DAG?

- by Hanno Fietz

I have a list of items (blue nodes below) which are categorized by the users of my application. The categories themselves can be grouped and categorized themselves. The resulting structure can be represented as a Directed Acyclic Graph (DAG) where the items are sinks at the bottom of the graph's topology and the top categories are sources. Note that while some of the categories might be well defined, a lot is going to be user defined and might be very messy. Example: On that structure, I want to perform the following operations: find all items (sinks) below a particular node (all items in Europe) find all paths (if any) that pass through all of a set of n nodes (all items sent via SMTP from example.com) find all nodes that lie below all of a set of nodes (intersection: goyish brown foods) The first seems quite straightforward: start at the node, follow all possible paths to the bottom and collect the items there. However, is there a faster approach? Remembering the nodes I already passed through probably helps avoiding unnecessary repetition, but are there more optimizations? How do I go about the second one? It seems that the first step would be to determine the height of each node in the set, as to determine at which one(s) to start and then find all paths below that which include the rest of the set. But is this the best (or even a good) approach? The graph traversal algorithms listed at Wikipedia all seem to be concerned with either finding a particular node or the shortest or otherwise most effective route between two nodes. I think both is not what I want, or did I just fail to see how this applies to my problem? Where else should I read?

Read the article
Whats the difference between Paxos and W+R>=N in Cassandra?

- by user1128016

Dynamo-like databases (e.g. Cassandra) provide ability to enforce consistency by means of quorum, i.e. a number of synchronously written replicas (W) and a number of replicas to read (R) should be chosen in such a way that W+RN where N is a replication factor. On the other hand, PAXOS-based systems like Zookeeper are also used as a consistent fault-tolerant storage. What is the difference between these two approaches? Does PAXOS provide guarantees that are not provided by W+RN schema?

Read the article
Finding the largest subtree in a BST

- by rakeshr

Given a binary tree, I want to find out the largest subtree which is a BST in it. Naive approach: I have a naive approach in mind where I visit every node of the tree and pass this node to a isBST function. I will also keep track of the number of nodes in a sub-tree if it is a BST. Is there a better approach than this ?

Read the article

< Previous Page | 78 79 80 81 82 83 84 85 86 87 88 89 | Next Page >