Search Results

Search found 5079 results on 204 pages for 'bankers algorithm'.

Page 185/204 | < Previous Page | 181 182 183 184 185 186 187 188 189 190 191 192 | Next Page >

Parse a CSV file using python (to make a decision tree later)

- by Margaret

First off, full disclosure: This is going towards a uni assignment, so I don't want to receive code. :). I'm more looking for approaches; I'm very new to python, having read a book but not yet written any code. The entire task is to import the contents of a CSV file, create a decision tree from the contents of the CSV file (using the ID3 algorithm), and then parse a second CSV file to run against the tree. There's a big (understandable) preference to have it capable of dealing with different CSV files (I asked if we were allowed to hard code the column names, mostly to eliminate it as a possibility, and the answer was no). The CSV files are in a fairly standard format; the header row is marked with a # then the column names are displayed, and every row after that is a simple series of values. Example: # Column1, Column2, Column3, Column4 Value01, Value02, Value03, Value04 Value11, Value12, Value13, Value14 At the moment, I'm trying to work out the first part: parsing the CSV. To make the decisions for the decision tree, a dictionary structure seems like it's going to be the most logical; so I was thinking of doing something along these lines: Read in each line, character by character If the character is not a comma or a space Append character to temporary string If the character is a comma Append the temporary string to a list Empty string Once a line has been read Create a dictionary using the header row as the key (somehow!) Append that dictionary to a list However, if I do things that way, I'm not sure how to make a mapping between the keys and the values. I'm also wondering whether there is some way to perform an action on every dictionary in a list, since I'll need to be doing things to the effect of "Everyone return their values for columns Column1 and Column4, so I can count up who has what!" - I assume that there is some mechanism, but I don't think I know how to do it. Is a dictionary the best way to do it? Would I be better off doing things using some other data structure? If so, what?

Read the article
algorithim for simple colision detection in Java

- by Bob Twinkles

I'm not very experienced with Java, just started a couple weeks ago, but I have a simple applet that has two user controlled balls, drawn through java.awt and I need a way to detect a collision with between them. I have an algorithm for detecting collision with the walls: while (true){ if (xPositon > (300 - radius)){ xSpeed = -xSpeed; } else if (xPositon < radius){ xSpeed = -xSpeed; } else if (yPositon > (300 - radius)) { ySpeed = -ySpeed; } else if (yPositon < radius){ ySpeed = -ySpeed; } xPositon += xSpeed; yPositon += ySpeed; and for the second ball if (xPositon2 > (300 - radius)){ xSpeed2 = -xSpeed2; } else if (xPositon2 < radius){ xSpeed2 = -xSpeed2; } else if (yPositon2 > (300 - radius)) { ySpeed2 = -ySpeed2; } else if (yPositon2 < radius){ ySpeed2 = -ySpeed2; } xPositon2 += xSpeed2; yPositon2 += ySpeed2; the applet is 300 pixels by 300 pixels radius stores the radius of the circles xPositon and xPositon2 store the x cordanents for the two balls yPositon and yPositon store the y cordanents for the two balls xSpeed and xSpeed2 store the x velocities for the two balls ySpeed and ySpeed2 store the y velocities for the two balls I've only taken algebra 1 so please no advanced math or physics.

Read the article
iPhone: How to Determine Average Light/Dark of an Area of an UIImage

- by TechZen

I need to place labels with a transparent background over a variable-content UIImage. Readability will vary significantly depending on the relationship between the color of the label's text and the color/luminosity of the area of the image displayed under the label. Since the image will be constantly changing, the color of the label's text needs to change in sync. I have found several techniques for determining the color, perceived luminosity etc of a single pixel. However, I need to rather quickly (while a view loads) determine the rough perceived color/luminosity of an area of the UIImage under the frame of the UILabel. I presume I will also need to measure the alpha because the same color/luminosity looks different at different alpha values. Is there a way to calculate such a value for an area? Will I be reduced to simply summing pixels? If it comes to that, is there an algorithm to accomplish this? I've thought of two possible approaches: Perform some "folding" operations i.e. combining pixels from one half of the area to the other half. Then repeat until I get a single value. Would this be practical? How would you logically combine pixels to average their perceived color/luminosity? Sample a statistically significant number of pixels in the area and then combine them (somehow) to get a rough measure. I think this problem comes up a lot these days with people being so found of customizing backgrounds. Seems like something that would be worth my time to bang out a category or class to handle this and then share it around.

Read the article
How to change the date/time in Python for all modules?

- by Felix Schwarz

When I write with business logic, my code often depends on the current time. For example the algorithm which looks at each unfinished order and checks if an invoice should be sent (which depends on the no of days since the job was ended). In these cases creating an invoice is not triggered by an explicit user action but by a background job. Now this creates a problem for me when it comes to testing: I can test invoice creation itself easily However it is hard to create an order in a test and check that the background job identifies the correct orders at the correct time. So far I found two solutions: In the test setup, calculate the job dates relative to the current date. Downside: The code becomes quite complicated as there are no explicit dates written anymore. Sometimes the business logic is pretty complex for edge cases so it becomes hard to debug due to all these relative dates. I have my own date/time accessor functions which I use throughout my code. In the test I just set a current date and all modules get this date. So I can simulate an order creation in February and check that the invoice is created in April easily. Downside: 3rd party modules do not use this mechanism so it's really hard to integrate+test these. The second approach was way more successful to me after all. Therefore I'm looking for a way to set the time Python's datetime+time modules return. Setting the date is usually enough, I don't need to set the current hour or second (even though this would be nice). Is there such a utility? Is there an (internal) Python API that I can use?

Read the article
Problem with Boost::Asio for C++

- by Martin Lauridsen

Hi there, For my bachelors thesis, I am implementing a distributed version of an algorithm for factoring large integers (finding the prime factorisation). This has applications in e.g. security of the RSA cryptosystem. My vision is, that clients (linux or windows) will download an application and compute some numbers (these are independant, thus suited for parallelization). The numbers (not found very often), will be sent to a master server, to collect these numbers. Once enough numbers have been collected by the master server, it will do the rest of the computation, which cannot be easily parallelized. Anyhow, to the technicalities. I was thinking to use Boost::Asio to do a socket client/server implementation, for the clients communication with the master server. Since I want to compile for both linux and windows, I thought windows would be as good a place to start as any. So I downloaded the Boost library and compiled it, as it said on the Boost Getting Started page: bootstrap .\bjam It all compiled just fine. Then I try to compile one of the tutorial examples, client.cpp, from Asio, found (here.. edit: cant post link because of restrictions). I am using the Visual C++ compiler from Microsoft Visual Studio 2008, like this: cl /EHsc /I D:\Downloads\boost_1_42_0 client.cpp But I get this error: /out:client.exe client.obj LINK : fatal error LNK1104: cannot open file 'libboost_system-vc90-mt-s-1_42.lib' Anyone have any idea what could be wrong, or how I could move forward? I have been trying pretty much all week, to get a simple client/server socket program for c++ working, but with no luck. Serious frustration kicking in. Thank you in advance.

Read the article
Parallel.For Batching

- by chibacity

Is there built-in support in the TPL for batching operations? I was recently playing with a routine to carry out character replacement on a character array which required a lookup table i.e. transliteration: for (int i = 0; i < chars.Length; i++) { char replaceChar; if (lookup.TryGetValue(chars[i], out replaceChar)) { chars[i] = replaceChar; } } I could see that this could be trivially parallelized, so jumped in with a first stab which I knew would perform worse as the tasks were too fine-grained: Parallel.For(0, chars.Length, i => { char replaceChar; if (lookup.TryGetValue(chars[i], out replaceChar)) { chars[i] = replaceChar; } }); I then reworked the algorithm to use batching so that the work could be chunked onto different threads in less fine-grained batches. This made use of threads as expected and I got some near linear speed up. I'm sure that there must be built-in support for batching in the TPL. What is the syntax, and how do I use it? const int CharBatch = 100; int charLen = chars.Length; Parallel.For(0, ((charLen / CharBatch) + 1), i => { int batchUpper = ((i + 1) * CharBatch); for (int j = i * CharBatch; j < batchUpper && j < charLen; j++) { char replaceChar; if (lookup.TryGetValue(chars[j], out replaceChar)) { chars[j] = replaceChar; } } });

Read the article
Creating subtree from tree which is represented in xml - python

- by Jay

Hi I have an XML (in the form of tree), I require to create sub-tree out of it. For ex: <a> <b> <c>Hello</c> <d> <e>Hi</e> </a> Subtree would be <root> <a> <b> <c>Hello</c> </b> </a> <a> <d> <e>Hi</e> </d> </a> </root> What is the best XML library in python to do it? Any algorithm that already does this would also be helpful. Note: the XML doc won't be that big, it will easily fit in memory.

Read the article
How do I optimize this postfix expression tree for speed?

- by Peter Stewart

Thanks to the help I received in this post: I have a nice, concise recursive function to traverse a tree in postfix order: deque <char*> d; void Node::postfix() { if (left != __nullptr) { left->postfix(); } if (right != __nullptr) { right->postfix(); } d.push_front(cargo); return; }; This is an expression tree. The branch nodes are operators randomly selected from an array, and the leaf nodes are values or the variable 'x', also randomly selected from an array. char *values[10]={"1.0","2.0","3.0","4.0","5.0","6.0","7.0","8.0","9.0","x"}; char *ops[4]={"+","-","*","/"}; As this will be called billions of times during a run of the genetic algorithm of which it is a part, I'd like to optimize it for speed. I have a number of questions on this topic which I will ask in separate postings. The first is: how can I get access to each 'cargo' as it is found. That is: instead of pushing 'cargo' onto a deque, and then processing the deque to get the value, I'd like to start processing it right away. I don't yet know about parallel processing in c++, but this would ideally be done concurrently on two different processors. In python, I'd make the function a generator and access succeeding 'cargo's using .next(). But I'm using c++ to speed up the python implementation. I'm thinking that this kind of tree has been around for a long time, and somebody has probably optimized it already. Any Ideas? Thanks

Read the article
String Manipulation in C

- by baris_a

Hi guys, I am helping my nephew for his C lab homework, it is a string manipulation assignment and applying Wang's algorithm. Here is the BNF representation for the input. <sequent> ::= <lhs> # <rhs> <lhs> ::= <formulalist>| e <rhs> ::= <formulalist>| e <formulalist> ::= <formula>|<formula> , <formulalist> <formula> ::= <letter>| - <formula>| (<formula><in?xop><formula>) <in?xop> ::= & | | | > <letter> ::= A | B | ... | Z What is the best practice to handle and parse this kind of input in C? How can I parse this structure without using struct? Thanks in advance.

Read the article
Are these two functions the same?

- by Ranhiru

There is a function in the AES algorithm, to multiply a byte by 2 in Galois Field. This is the function given in a website private static byte gfmultby02(byte b) { if (b < 0x80) return (byte)(int)(b <<1); else return (byte)( (int)(b << 1) ^ (int)(0x1b) ); } This is the function i wrote. private static byte MulGF2(byte x) { if (x < 0x80) return (byte)(x << 1); else { return (byte)((x << 1) ^ 0x1b); } } What i need to know is, given any byte whether this will perform in the same manner. Actually I am worried about the extra conversion of to int and then again to byte. So far I have tested and it looks fine. Does the extra cast to int and then to byte make a difference?

Read the article
Is learning C++ a good idea?

- by chang

The more I hear and read about C++ (e.g. this: http://lwn.net/Articles/249460/), I get the impression, that I'd waste my time learning C++. I some wrote network routing algorithm in C++ for a simulator, and it was a pain (as expected, especially coming from a perl/python/Java background ...). I'm never happy about giving up on some technology, but I would be happy, if I could limit my knowledge of C-family languages to just C, C# and Objective-C (even OS Xs Cocoa, which is huge and takes a lot of time to learn looks like joy compared to C++ ...). Do I need to consider myself dumb or unwilling, just because I'm not partial to the pain involved learning this stuff? Technologies advance and there will be options other than C++, when deciding on implementation languages, or not? And for speed: If speed were that critical, I'd go for a plain C implementation instead, or write C extensions for much more productive languages like ruby or python ... The one-line version of the above: Will C++ stay such a relevant language that every committed programmer should be familiar with it? [ edit / thank you very much for your interesting and useful answers so far .. ] [ edit / .. i am accepting the top-rated answer; thanks again for all answers! ]

Read the article
String literal recognition problem

- by helicera

Hello! I'm trying to recognize string literal by reading string per symbol. Here is a sample code: #region [String Literal (")] case '"': // {string literal ""} { // skipping '"' ChCurrent = Line.ElementAtOrDefault<Char>(++ChPosition); while(ChCurrent != '"') { Value.Append(ChCurrent); ChCurrent = Line.ElementAtOrDefault<Char>(++ChPosition); if(ChCurrent == '"') { // "" sequence only acceptable if(Line.ElementAtOrDefault<Char>(ChPosition + 1) == '"') { Value.Append(ChCurrent); // skip 2nd double quote ChPosition++; // move position next ChCurrent = Line.ElementAtOrDefault<Char>(++ChPosition); } } else if(default(Char) == ChCurrent) { // message: unterminated string throw new ScanningException(); } } ChPosition++; break; } #endregion When I run test: [Test] [ExpectedException(typeof(ScanningException))] public void ScanDoubleQuotedStrings() { this.Scanner.Run(@"""Hello Language Design""", default(System.Int32)); this.Scanner.Run(@"""Is there any problems with the """"strings""""?""", default(System.Int32)); this.Scanner.Run(@"""v#:';?325;.<>,|+_)""(*&^%$#@![]{}\|-_=""", default(System.Int32)); while(0 != this.Scanner.TokensCount - 1) { Assert.AreEqual(Token.TokenClass.StringLiteral, this.Scanner.NextToken.Class); } } It passes with success.. while I'm expecting to have an exception according to unmatched " mark in this.Scanner.Run(@"""v#:';?325;.<>,|+_)""(*&^%$#@![]{}\|-_=""", default(System.Int32)); Can anyone explain where is my mistake or give an advice on algorithm.

Read the article
What about race condition in multithreaded reading?

- by themoob

Hi, According to an article on IBM.com, "a race condition is a situation in which two or more threads or processes are reading or writing some shared data, and the final result depends on the timing of how the threads are scheduled. Race conditions can lead to unpredictable results and subtle program bugs." . Although the article concerns Java, I have in general been taught the same definition. As far as I know, simple operation of reading from RAM is composed of setting the states of specific input lines (address, read etc.) and reading the states of output lines. This is an operation that obviously cannot be executed simultaneously by two devices and has to be serialized. Now let's suppose we have a situation when a couple of threads access an object in memory. In theory, this access should be serialized in order to prevent race conditions. But e.g. the readers/writers algorithm assumes that an arbitrary number of readers can use the shared memory at the same time. So, the question is: does one have to implement an exclusive lock for read when using multithreading (in WinAPI e.g.)? If not, why? Where is this control implemented - OS, hardware? Best regards, Kuba

Read the article
Decimal problem in Java

- by Jerome

I am experimenting with writing a java class for financial engineering (hobby of mine). The algorithm that I desire to implement is: 100 / 0.05 = 2000 Instead I get: 100 / 0.05 = 2000.9999999999998 I understand the problem to be one with converting from binary numbers to decimals (float -- int). I have looked at the BigDecimal class on the web but have not quite found what I am looking for. Attached is the program and a class that I wrote: // DCF class public class DCF { double rate; double principle; int period; public DCF(double $rate, double $principle, int $period) { rate = $rate; principle = $principle; period = $period; } // returns the console value public double consol() { return principle/rate; } // to string Method public String toString() { return "(" + rate + "," + principle + "," + period + ")"; } } Now the actual program: // consol program public class DCFmain { public static void main(String[] args) { // create new DCF DCF abacus = new DCF(0.05, 100.05, 5); System.out.println(abacus); System.out.println("Console value= " + abacus.consol() ); } } Output: (0.05,100.05,5) Console value= 2000.9999999999998 Thanks!

Read the article
How to keep Java Frame from waiting?

- by pypmannetjies

I am writing a genetic algorithm that approximates an image with a polygon. While going through the different generations, I'd like to output the progress to a JFrame. However, it seems like the JFrame waits until the GA's while loop finishes to display something. I don't believe it's a problem like repainting, since it eventually does display everything once the while loop exits. I want to GUI to update dynamically even when the while loop is running. Here is my code: while (some conditions) { //do some other stuff gui.displayPolygon(best); gui.displayFitness(fitness); gui.setVisible(true); } public void displayPolygon(Polygon poly) { BufferedImage bpoly = ImageProcessor.createImageFromPoly(poly); ImageProcessor.displayImage(bpoly, polyPanel); this.setVisible(true); } public static void displayImage(BufferedImage bimg, JPanel panel) { panel.removeAll(); panel.setBounds(0, 0, bimg.getWidth(), bimg.getHeight()); JImagePanel innerPanel = new JImagePanel(bimg, 25, 25); panel.add(innerPanel); innerPanel.setLocation(25, 25); innerPanel.setVisible(true); panel.setVisible(true); }

Read the article
LINQ : How to query how to sort result by most similarity/equality

- by aNui

I want to do a search for Music instruments which has its informations Name, Category and Origin as I asked in my post. But now I want to sort/group the result by similarity/equality to the keyword such as. If I have the list { Harp, Piano, Drum, Guitar, Guitarrón } and if I queried "p" the result should be { Piano, Harp } but it shows Harp first because of the list's sequence and if I add {Grand Piano} to the list and query "piano" the result shoud be like { Piano, Grand Piano } here's my code static IEnumerable<MInstrument> InstrumentsSearch(IEnumerable<MInstrument> InstrumentsList, string query, MInstrument.Category[] SelectedCategories, MInstrument.Origin[] SelectedOrigins) { var result = InstrumentsList .Where(item => SelectedCategories.Contains(item.category)) .Where(item => SelectedOrigins.Contains(item.origin)) .Where(item => { if ( (" " + item.Name.ToLower()).Contains(" " + query.ToLower()) || item.Name.IndexOf(query) != -1 ) { return true; } return false; } ) .Take(30); return result.ToList<MInstrument>(); } Or the result may be like my old self-invented algorithm that I called "by order of occurence", that is just OK to me. Is there any way to do that, please tell me. Thanks in advance.

Read the article
How to improve multi-threaded access to Cache (custom implementation)

- by Andy

I have a custom Cache implementation, which allows to cache TCacheable<TKey> descendants using LRU (Least Recently Used) cache replacement algorithm. Every time an element is accessed, it is bubbled up to the top of the LRU queue using the following synchronized function: // a single instance is created to handle all TCacheable<T> elements public class Cache() { private object syncQueue = new object(); private void topQueue(TCacheable<T> el) { lock (syncQueue) if (newest != el) { if (el.elder != null) el.elder.newer = el.newer; if (el.newer != null) el.newer.elder = el.elder; if (oldest == el) oldest = el.newer; if (oldest == null) oldest = el; if (newest != null) newest.newer = el; el.newer = null; el.elder = newest; newest = el; } } } The bottleneck in this function is the lock() operator, which limits cache access to just one thread at a time. Question: Is it possible to get rid of lock(syncQueue) in this function while still preserving the queue integrity?

Read the article
Constraint to array dimension in C language

- by Summer_More_More_Tea

int KMP( const char *original, int o_len, const char *substring, int s_len ){ if( o_len < s_len ) return -1; int k = 0; int cur = 1; int fail[ s_len ]; fail[ k ] = -1; while( cur < s_len ){ k = cur - 1; do{ if( substring[ cur ] == substring[ k ] ){ fail[ cur ] = k; break; }else{ k = fail[ k ] + 1; } }while( k ); if( !k && ( substring[ cur ] != substring[ 0 ] ) ){ fail[ cur ] = -1; }else if( !k ){ fail[ cur ] = 0; } cur++; } k = 0; cur = 0; while( ( k < s_len ) && ( cur < o_len ) ){ if( original[ cur ] == substring[ k ] ){ cur++; k++; }else{ if( k == 0 ){ cur++; }else{ k = fail[ k - 1 ] + 1; } } } if( k == s_len ) return cur - k; else return -1; } This is a KMP algorithm I once coded. When I reviewed it this morning, I find it strange that an integer array is defined as int fail[ s_len ]. Does the specification requires dimesion of arrays compile-time constant? How can this code pass the compilation? By the way, my gcc version is 4.4.1. Thanks in advance!

Read the article
Simplification / optimization of GPS track

- by GreyCat

I've got a GPS track, produces by gpxlogger(1) (supplied as a client for gpsd). GPS receiver updates its coordinates every 1 second, gpxlogger's logic is very simple, it writes down location (lat, lon, ele) and a timestamp (time) received from GPS every n seconds (n = 3 in my case). After writing down a several hours worth of track, gpxlogger saves several megabyte long GPX file that includes several thousands of points. Afterwards, I try to plot this track on a map and use it with OpenLayers. It works, but several thousands of points make using the map a sloppy and slow experience. I understand that having several thousands of points of suboptimal. There are myriads of points that can be deleted without losing almost anything: when there are several points making up roughly the straight line and we're moving with the same constant speed between them, we can just leave the first and the last point and throw anything else. I thought of using gpsbabel for such track simplification / optimization job, but, alas, it's simplification filter works only with routes, i.e. analyzing only geometrical shape of path, without timestamps (i.e. not checking that the speed was roughly constant). Is there some ready-made utility / library / algorithm available to optimize tracks? Or may be I'm missing some clever option with gpsbabel?

Read the article
Manhattan Heuristic function for A-star (A*)

- by Shawn Mclean

I found this algorithm here. I have a problem, I cant seem to understand how to set up and pass my heuristic function. static public Path<TNode> AStar<TNode>(TNode start, TNode destination, Func<TNode, TNode, double> distance, Func<TNode, double> estimate) where TNode : IHasNeighbours<TNode> { var closed = new HashSet<TNode>(); var queue = new PriorityQueue<double, Path<TNode>>(); queue.Enqueue(0, new Path<TNode>(start)); while (!queue.IsEmpty) { var path = queue.Dequeue(); if (closed.Contains(path.LastStep)) continue; if (path.LastStep.Equals(destination)) return path; closed.Add(path.LastStep); foreach (TNode n in path.LastStep.Neighbours) { double d = distance(path.LastStep, n); var newPath = path.AddStep(n, d); queue.Enqueue(newPath.TotalCost + estimate(n), newPath); } } return null; } As you can see, it accepts 2 functions, a distance and a estimate function. Using the Manhattan Heuristic Distance function, I need to take 2 parameters. Do I need to modify his source and change it to accepting 2 parameters of TNode so I can pass a Manhattan estimate to it? This means the 4th param will look like this: Func<TNode, TNode, double> estimate) where TNode : IHasNeighbours<TNode> and change the estimate function to: queue.Enqueue(newPath.TotalCost + estimate(n, path.LastStep), newPath); My Manhattan function is: private float manhattanHeuristic(Vector3 newNode, Vector3 end) { return (Math.Abs(newNode.X - end.X) + Math.Abs(newNode.Y - end.Y)); }

Read the article
Thread-safety of read-only memory access

- by Edmund

I've implemented the Barnes-Hut gravity algorithm in C as follows: Build a tree of clustered stars. For each star, traverse the tree and apply the gravitational forces from each applicable node. Update the star velocities and positions. Stage 2 is the most expensive stage, and so is implemented in parallel by dividing the set of stars. E.g. with 1000 stars and 2 threads, I have one thread processing the first 500 stars and the second thread processing the second 500. In practice this works: it speeds the computation by about 30% with two threads on a two-core machine, compared to the non-threaded version. Additionally, it yields the same numerical results as the original non-threaded version. My concern is that the two threads are accessing the same resource (namely, the tree) simultaneously. I have not added any synchronisation to the thread workers, so it's likely they will attempt to read from the same location at some point. Although access to the tree is strictly read-only I am not 100% sure it's safe. It has worked when I've tested it but I know this is no guarantee of correctness! Questions Do I need to make a private copy of the tree for each thread? Even if it is safe, are there performance problems of accessing the same memory from multiple threads?

Read the article
how does MySQL implement the "group by"?

- by user188916

I read from the MySQL Reference Manual and find that when it can take use of index,it just do index scan,other it will create tmp tables and do things like filesort. And I also read from other article that the "Group By" result will sort by group by columns by default,if "order by null" clause added,it won't don filesort. The difference can be found from the "explain ..." clause. so my problem is:what is the difference between "group by" clause that with "order by null" and which doesn't have? I try to use profiling to see what mysql do on the background,and only see result like: result for group clause without order by null: |preparing | 0.000016 | | Creating tmp table | 0.000048 | | executing | 0.000009 | | Copying to tmp table | 0.000109 | **| Sorting result | 0.000023 |** | Sending data | 0.000027 | result for clause with "order by null": preparing | 0.000016 | | Creating tmp table | 0.000052 | | executing | 0.000009 | | Copying to tmp table | 0.000114 | | Sending data | 0.000028 | So I guess what MySQL do when the "order by null" added,it does not use filesort algorithm,maybe when it creates the tmp table,it uses index as well,and then use the index to do group by operation,when completed,it just read result from the table rows and does not sort the result. But my original opinion is that MySQL can use quicksort to sort the items and then do group by,so the result will be sorted as well. Any opinion appreciated,thanks.

Read the article
Thread management advice - Is TPL a good idea?

- by Ian

I'm hoping to get some advice on the use of thread managment and hopefully the task parallel library, because I'm not sure I've been going down the correct route. Probably best is that I give an outline of what I'm trying to do. Given a Problem I need to generate a Solution using a heuristic based algorithm. I start of by calculating a base solution, this operation I don't think can be parallelised so we don't need to worry about. Once the inital solution has been generated, I want to trigger n threads, which attempt to find a better solution. These threads need to do a couple of things: They need to be initalized with a different 'optimization metric'. In other words they are attempting to optimize different things, with a precedence level set within code. This means they all run slightly different calculation engines. I'm not sure if I can do this with the TPL.. If one of the threads finds a better solution that the currently best known solution (which needs to be shared across all threads) then it needs to update the best solution, and force a number of other threads to restart (again this depends on precedence levels of the optimization metrics). I may also wish to combine certain calculations across threads (e.g. keep a union of probabilities for a certain approach to the problem). This is probably more optional though. The whole system needs to be thread safe obviously and I want it to be running as fast as possible. I tried quite an implementation that involved managing my own threads and shutting them down etc, but it started getting quite complicated, and I'm now wondering if the TPL might be better. I'm wondering if anyone can offer any general guidance? Thanks...

Read the article
reading newlines with FORMAT statement

- by peter.murray.rust

I'm writing a preprocessor and postprocessor for Fortran input and output using FORMAT-like statements (there are reasons not to use a FORTRAN library). I want to treat the new line ("/") character correctly. I don't have a Fortran compiler immediately to hand. Is there a simple algorithm for working out how many newlines are written or consumed (This post just gives reading examples) [Please assume a FORTRAN77-like mentality in the FORTRAN code and correct any FORTRAN syntax on my part] UPDATE: no comments yet so I am reduced to finding a compiler and running it myself. I'll post the answers if I'm not beaten to it. No-one commented I had the format syntax wrong. I've changed it but there may still be errors Assume datafile 1 a b c d etc... (a) does the READ command always consume a newline? does READ(1, '(A)') A READ(1, '(A)') B give A='a' and B='b' (b) what does READ(1,'(A,/)') A READ(1,'(A)') B give for B? (I would assume 'c') (c) what does READ(1, '(/)') READ(1, '(A)') A give for A (is it 'b' or 'c') (d) what does READ(1,'(A,/,A)') A, B READ(1,'(A)') C give for A and B and C(can I assume 'a' and 'b' and 'c') (e) what does READ(1,'(A,/,/,A)') A, B READ(1,'(A)') C give for A and B and C(can I assume 'a' and 'c' and 'd')? Are there any cases in which the '/' is redundant?

Read the article
Random forests for short texts

- by Jasie

Hi all, I've been reading about Random Forests (1,2) because I think it'd be really cool to be able to classify a set of 1,000 sentences into pre-defined categories. I'm wondering if someone can explain to me the algorithm better, I think the papers are a bit dense. Here's the gist from 1: Overview We assume that the user knows about the construction of single classification trees. Random Forests grows many classification trees. To classify a new object from an input vector, put the input vector down each of the trees in the forest. Each tree gives a classification, and we say the tree "votes" for that class. The forest chooses the classification having the most votes (over all the trees in the forest). Each tree is grown as follows: If the number of cases in the training set is N, sample N cases at random - but with replacement, from the original data. This sample will be the training set for growing the tree. If there are M input variables, a number m « M is specified such that at each node, m variables are selected at random out of the M and the best split on these m is used to split the node. The value of m is held constant during the forest growing. Each tree is grown to the largest extent possible. There is no pruning. So, does this look right? I'd have N = 1,000 training cases (sentences), M = 100 variables (let's say, there are only 100 unique words across all sentences), so the input vector is a bit vector of length 100 corresponding to each word. I randomly sample N = 1000 cases at random (with replacement) to build trees from. I pick some small number of input variables m « M, let's say 10, to build a tree off of. Do I build tree nodes randomly, using all m input variables? How many classification trees do I build? Thanks for the help!

Read the article

< Previous Page | 181 182 183 184 185 186 187 188 189 190 191 192 | Next Page >