Search Results

Search found 8262 results on 331 pages for 'optimization algorithm'.

Page 126/331 | < Previous Page | 122 123 124 125 126 127 128 129 130 131 132 133  | Next Page >

  • List of Big-O for PHP functions?

    - by Kendall Hopkins
    After using PHP for a while now, I've noticed that not all PHP built in functions as fast as expected. Consider the below two possible implementations of a function that finds if a number is prime using a cached array of primes. //very slow for large $prime_array $prime_array = array( 2, 3, 5, 7, 11, 13, .... 104729, ... ); $result_array = array(); foreach( $array_of_number => $number ) { $result_array[$number] = in_array( $number, $large_prime_array ); } //still decent performance for large $prime_array $prime_array => array( 2 => NULL, 3 => NULL, 5 => NULL, 7 => NULL, 11 => NULL, 13 => NULL, .... 104729 => NULL, ... ); foreach( $array_of_number => $number ) { $result_array[$number] = array_key_exists( $number, $large_prime_array ); } This is because in_array is implemented with a linear search O(n) which will linearly slow down as $prime_array grows. Where the array_key_exists function is implemented with a hash lookup O(1) which will not slow down unless the hash table gets extremely populated (in which case it's only O(logn)). So far I've had to discover the big-O's via trial and error, and occasionally looking at the source code. Now for the question... I was wondering if there was a list of the theoretical (or practical) big O times for all* the PHP built in functions. *or at least the interesting ones For example find it very hard to predict what the big O of functions listed because the possible implementation depends on unknown core data structures of PHP: array_merge, array_merge_recursive, array_reverse, array_intersect, array_combine, str_replace (with array inputs), etc.

    Read the article

  • MongoDB vs CouchDB (Speed optimization)

    - by Edward83
    Hi! I made some tests of speed to compare MongoDB and CouchDB. Only inserts were while testing. I got MongoDB 15x faster than CouchDB. I know that it is because of sockets vs http. But, it is very interesting for me how can I optimize inserts in CouchDB? Test platform: Windows XP SP3 32 bit. I used last versions of MongoDB, MongoDB C# Driver and last version of installation package of CouchDB for Windows. Thanks!

    Read the article

  • Most Frequent 3 page sequence in a weblog

    - by Sundararajan S
    Given a web log which consists of fields 'User ' 'Page url'. We have to find out the most frequent 3-page sequence that users takes. There is a time stamp. and it is not guaranteed that the single user access will be logged sequentially it could be like user1 Page1 user2 Pagex user1 Page2 User10 Pagex user1 Page 3 her User1s page sequence is page1- page2- page 3

    Read the article

  • How to output multicolumn html without "widows"?

    - by user314850
    I need to output to HTML a list of categorized links in exactly three columns of text. They must be displayed similar to columns in a newspaper or magazine. So, for example, if there are 20 lines total the first and second columns would contain 7 lines and the last column would contain 6. The list must be dynamic; it will be regularly changed. The tricky part is that the links are categorized with a title and this title cannot be a "widow". If you have a page layout background you'll know that this means the titles cannot be displayed at the bottom of the column -- they must have at least one link underneath them, otherwise they should bump to the next column (I know, technically it should be two lines if I were actually doing page layout, but in this case one is acceptable). I'm having a difficult time figuring out how to get this done. Here's an example of what I mean: Shopping Link 3 Link1 Link 1 Link 4 Link2 Link 2 Link 3 Link 3 Cars Link 1 Music Games Link 2 Link 1 Link 1 Link 2 News As you can see, the "News" title is at the bottom of the middle column, and so is a "widow". This is unacceptable. I could bump it to the next column, but that would create an unnecessarily large amount of white space at the bottom of the second column. What needs to happen instead is that the entire list needs to be re-balanced. I'm wondering if anyone has any tips for how to accomplish this, or perhaps source code or a plug in. Python is preferable, but any language is fine. I'm just trying to get the general concept down.

    Read the article

  • Base X string encoding

    - by Paul Stone
    I'm looking for a routine that will encode a string (stream of bytes) into an arbitrary base/alphabet (like base64 encoding but I get to choose the alphabet). I've seen a few routines that do base X encoding for a number, but not for a string.

    Read the article

  • Why does Java's hashCode() in String use 31 as a multiplier?

    - by jacobko
    In Java, the hash code for a String object is computed as s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. Why is 31 used as a multiplier? I understand that the multiplier should be a relatively large prime number. So why not 29, or 37, or even 97?

    Read the article

  • Ideas for designing an automated content tagging system needed

    - by Benjamin Smith
    I am currently designing a website that amongst other is required to display and organise small amounts of text content (mainly quotes, article stubs, etc.). I currently have a database with 250,000+ items and need to come up with a method of tagging each item with relevant tags which will eventually allow for easy searching/browsing of the content for users. A very simplistic idea I have (and one that I believe is employed by some sites that I have been looking to for inspiration (http://www.brainyquote.com/quotes/topics.html)), is to simply search the database for certain words or phrases and use these words as tags for the content. This can easily be extended so that if for example a user wanted to show all items with a theme of love then I would just return a list of items with words and phrases relating to this theme. This would not be hard to implement but does not provide very good results. For example if I were to search for the month 'May' in the database with the aim of then classifying the items returned as realting to the topic of Spring then I would get back all occurrences of the word May, regardless of the semantic meaning. Another shortcoming of this method is that I believe it would be quite hard to automate the process to any large scale. What I really require is a library that can take an item, break it down and analyse the semantic meaning and also return a list of tags that would correctly classify the item. I know this is a lot to ask and I have a feeling I will end up reverting to the aforementioned method but I just thought I should ask if anyone knew of any pre-existing solution. I think that as the items in the database are short then it is probably quite a hard task to analyse any meaning from them however I may be mistaken. Another path to possibly go down would be to use something like amazon turk to outsource the task which may produce good results but would be expensive. Eventually I would like users to be able to (and want to!) tag content and to vote for the most relevant tags, possibly using a gameification mechanic as motivation however this is some way down the line. A temporary fix may be the best thing if this were the route I decided to go down as I could use the rough results I got as the starting point for a more in depth solution. If you've read this far, thanks for sticking with me, I know I'm spitballing but any input would be really helpful. Thanks.

    Read the article

  • question about qsort in c++

    - by davit-datuashvili
    i have following code in c++ #include <iostream> using namespace std; void qsort5(int a[],int n){ int i; int j; if (n<=1) return; for (i=1;i<n;i++) j=0; if (a[i]<a[0]) swap(++j,i,a); swap(0,j,a); qsort5(a,j); qsort(a+j+1,n-j-1); } int main() { return 0; } void swap(int i,int j,int a[]) { int t=a[i]; a[i]=a[j]; a[j]=t; } i have problem 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(13) : error C2780: 'void std::swap(std::basic_string<_Elem,_Traits,_Alloc> &,std::basic_string<_Elem,_Traits,_Alloc> &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\xstring(2203) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(13) : error C2780: 'void std::swap(std::pair<_Ty1,_Ty2> &,std::pair<_Ty1,_Ty2> &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\utility(76) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(13) : error C2780: 'void std::swap(_Ty &,_Ty &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\utility(16) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(14) : error C2780: 'void std::swap(std::basic_string<_Elem,_Traits,_Alloc> &,std::basic_string<_Elem,_Traits,_Alloc> &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\xstring(2203) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(14) : error C2780: 'void std::swap(std::pair<_Ty1,_Ty2> &,std::pair<_Ty1,_Ty2> &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\utility(76) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(14) : error C2780: 'void std::swap(_Ty &,_Ty &)' : expects 2 arguments - 3 provided 1> c:\program files\microsoft visual studio 9.0\vc\include\utility(16) : see declaration of 'std::swap' 1>c:\users\dato\documents\visual studio 2008\projects\qsort5\qsort5\qsort5.cpp(16) : error C2661: 'qsort' : no overloaded function takes 2 arguments 1>Build log was saved at "file://c:\Users\dato\Documents\Visual Studio 2008\Projects\qsort5\qsort5\Debug\BuildLog.htm" please help

    Read the article

  • Convert arbitrary size of byte[] to BigInteger[] and then safely convert back to exactly the same by

    - by PatlaDJ
    I believe conversion exactly to BigInteger[] would be optimal in my case. Anyone had done or found this written in Java and willing to share? So imagine I have arbitrary size byte[] = {0xff,0x3e,0x12,0x45,0x1d,0x11,0x2a,0x80,0x81,0x45,0x1d,0x11,0x2a,0x80,0x81} How do I convert it to array of BigInteger's and then be able to recover it back the original byte array safely? ty in advance.

    Read the article

  • How does Amazon's Statistically Improbable Phrases work?

    - by ??iu
    How does something like Statistically Improbable Phrases work? According to amazon: Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside!™ program. To identify SIPs, our computers scan the text of all books in the Search Inside! program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book. SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!. For example, most SIPs for a book on taxes are tax related. But because we display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements. For instance, for Joel's first book, the SIPs are: leaky abstractions, antialiased text, own dog food, bug count, daily builds, bug database, software schedules One interesting complication is that these are phrases of either 2 or 3 words. This makes things a little more interesting because these phrases can overlap with or contain each other.

    Read the article

  • What is it about Fibonacci numbers?

    - by Ian Bishop
    Fibonacci numbers have become a popular introduction to recursion for Computer Science students and there's a strong argument that they persist within nature. For these reasons, many of us are familiar with them. They also exist within Computer Science elsewhere too; in surprisingly efficient data structures and algorithms based upon the sequence. There are two main examples that come to mind: Fibonacci heaps which have better amortized running time than binomial heaps. Fibonacci search which shares O(log N) running time with binary search on an ordered array. Is there some special property of these numbers that gives them an advantage over other numerical sequences? Is it a density quality? What other possible applications could they have? It seems strange to me as there are many natural number sequences that occur in other recursive problems, but I've never seen a Catalan heap.

    Read the article

  • Interview question: How do I detect a loop in this linked list?

    - by jjujuma
    Say you have a linked list structure in Java. It's made up of Nodes: class Node { Node next; // some user data } and each Node points to the next node, except for the last Node, which has null for next. Say there is a possibility that the list can contain a loop - i.e. the final Node, instead of having a null, has a reference to one of the nodes in the list which came before it. What's the best way of writing boolean hasLoop(Node first) which would return true if the given Node is the first of a list with a loop, and false otherwise? How could you write so that it takes a constant amount of space and a reasonable amount of time? Here's a picture of what a list with a loop looks like: Node->Node->Node->Node->Node->Node--\ \ | ----------------

    Read the article

  • how to develop a program to minimize errors in human transcription of hand written surveys

    - by Alex. S.
    I need to develop custom software to do surveys. Questions may be of multiple choice, or free text in a very few cases. I was asked to design a subsystem to check if there is any error in the manual data entry for the multiple choices part. We're trying to speed up the user data entry process and to minimize human input differences between digital forms and the original questionnaires. The surveys are filled with handwritten marks and text by human interviewers, so it's possible to find hard to read marks, or also the user could accidentally select a different value in some question, and we would like to avoid that. The software must include some automatic control to detect possible typing differences. Each answer of the multiple choice questions has the same probability of being selected. This question has two parts: The GUI. The most simple thing I have in mind is to implement the most usable design of the questions display: use of large and readable fonts and space generously the choices. Is there something else? For faster input, I would like to use drop down lists (favoring keyboard over mouse). Given the questions are grouped in sections, I would like to show the answers selected for the questions of that section, but this could slow down the process. Any other ideas? The error checking subsystem. What else can I do to minimize or to check human typos in the multiple choice questions? Is this a solvable problem? is there some statistical methodology to check values that were entered by the users are the same from the hand filled forms? For example, let's suppose the survey has 5 questions, and each has 4 options. Let's say I have n survey forms filled in paper by interviewers, and they're ready to be entered in the software, then how to minimize the accidental differences that can have the manual transcription of the n surveys, without having to double check everything in the 5 questions of the n surveys? My first suggestion is that at the end of the processing of all the hand filled forms, the software could choose some forms randomly to make a double check of the responses in a few instances, but on what criteria can I make this selection? This validation would be enough to cover everything in a significant way? The actual survey is nation level and it has 56 pages with over 200 questions in total, so it will be a lot of hand written pages by many people, and the intention is to reduce the likelihood of errors and to optimize speed in the data entry process. The surveys must filled in paper first, given the complications of taking laptops or handhelds with the interviewers.

    Read the article

  • question about api functions

    - by davit-datuashvili
    i have question we have API functions in java can user create it's own function and add to his java IDE? for example i am using netbeans can i create my own function add to netbean IDE?let say create binary function or something else thanks

    Read the article

  • Is there a name for the technique of using base-2 numbers to encode a list of unique options?

    - by Lunatik
    Apologies for the rather vague nature of this question, I've never been taught programming and Google is rather useless to a self-help guy like me in this case as the key words are pretty ambiguous. I am writing a couple of functions that encode and decode a list of options into a Long so they can easily be passed around the application, you know this kind of thing: 1 - Apple 2 - Orange 4 - Banana 8 - Plum etc. In this case the number 11 would represent Apple, Orange & Plum. I've got it working but I see this used all the time so assume there is a common name for the technique, and no doubt all sorts of best practice and clever algorithms that are at the moment just out of my reach.

    Read the article

  • Compact data structure for storing a large set of integral values

    - by Odrade
    I'm working on an application that needs to pass around large sets of Int32 values. The sets are expected to contain ~1,000,000-50,000,000 items, where each item is a database key in the range 0-50,000,000. I expect distribution of ids in any given set to be effectively random over this range. The operations I need on the set are dirt simple: Add a new value Iterate over all of the values. There is a serious concern about the memory usage of these sets, so I'm looking for a data structure that can store the ids more efficiently than a simple List<int>or HashSet<int>. I've looked at BitArray, but that can be wasteful depending on how sparse the ids are. I've also considered a bitwise trie, but I'm unsure how to calculate the space efficiency of that solution for the expected data. A Bloom Filter would be great, if only I could tolerate the false negatives. I would appreciate any suggestions of data structures suitable for this purpose. I'm interested in both out-of-the-box and custom solutions. EDIT: To answer your questions: No, the items don't need to be sorted By "pass around" I mean both pass between methods and serialize and send over the wire. I clearly should have mentioned this. There could be a decent number of these sets in memory at once (~100).

    Read the article

  • Converting to a column oriented array in Java

    - by halfwarp
    Although I have Java in the title, this could be for any OO language. I'd like to know a few new ideas to improve the performance of something I'm trying to do. I have a method that is constantly receiving an Object[] array. I need to split the Objects in this array through multiple arrays (List or something), so that I have an independent list for each column of all arrays the method receives. Example: List<List<Object>> column-oriented = new ArrayList<ArrayList<Object>>(); public void newObject(Object[] obj) { for(int i = 0; i < obj.length; i++) { column-oriented.get(i).add(obj[i]); } } Note: For simplicity I've omitted the initialization of objects and stuff. The code I've shown above is slow of course. I've already tried a few other things, but would like to hear some new ideas. How would you do this knowing it's very performance sensitive?

    Read the article

  • algorithm q: Fuzzy matching of structured data

    - by user86432
    I have a fairly small corpus of structured records sitting in a database. Given a tiny fraction of the information contained in a single record, submitted via a web form (so structured in the same way as the table schema), (let us call it the test record) I need to quickly draw up a list of the records that are the most likely matches for the test record, as well as provide a confidence estimate of how closely the search terms match a record. The primary purpose of this search is to discover whether someone is attempting to input a record that is duplicate to one in the corpus. There is a reasonable chance that the test record will be a dupe, and a reasonable chance the test record will not be a dupe. The records are about 12000 bytes wide and the total count of records is about 150,000. There are 110 columns in the table schema and 95% of searches will be on the top 5% most commonly searched columns. The data is stuff like names, addresses, telephone numbers, and other industry specific numbers. In both the corpus and the test record it is entered by hand and is semistructured within an individual field. You might at first blush say "weight the columns by hand and match word tokens within them", but it's not so easy. I thought so too: if I get a telephone number I thought that would indicate a perfect match. The problem is that there isn't a single field in the form whose token frequency does not vary by orders of magnitude. A telephone number might appear 100 times in the corpus or 1 time in the corpus. The same goes for any other field. This makes weighting at the field level impractical. I need a more fine-grained approach to get decent matching. My initial plan was to create a hash of hashes, top level being the fieldname. Then I would select all of the information from the corpus for a given field, attempt to clean up the data contained in it, and tokenize the sanitized data, hashing the tokens at the second level, with the tokens as keys and frequency as value. I would use the frequency count as a weight: the higher the frequency of a token in the reference corpus, the less weight I attach to that token if it is found in the test record. My first question is for the statisticians in the room: how would I use the frequency as a weight? Is there a precise mathematical relationship between n, the number of records, f(t), the frequency with which a token t appeared in the corpus, the probability o that a record is an original and not a duplicate, and the probability p that the test record is really a record x given the test and x contain the same t in the same field? How about the relationship for multiple token matches across multiple fields? Since I sincerely doubt that there is, is there anything that gets me close but is better than a completely arbitrary hack full of magic factors? Barring that, has anyone got a way to do this? I'm especially keen on other suggestions that do not involve maintaining another table in the database, such as a token frequency lookup table :). This is my first post on StackOverflow, thanks in advance for any replies you may see fit to give.

    Read the article

  • Whats the difference between Paxos and W+R>=N in Cassandra?

    - by user1128016
    Dynamo-like databases (e.g. Cassandra) provide ability to enforce consistency by means of quorum, i.e. a number of synchronously written replicas (W) and a number of replicas to read (R) should be chosen in such a way that W+RN where N is a replication factor. On the other hand, PAXOS-based systems like Zookeeper are also used as a consistent fault-tolerant storage. What is the difference between these two approaches? Does PAXOS provide guarantees that are not provided by W+RN schema?

    Read the article

  • How can I group an array of rectangles into "Islands" of connected regions?

    - by Eric
    The problem I have an array of java.awt.Rectangles. For those who are not familiar with this class, the important piece of information is that they provide an .intersects(Rectangle b) function. I would like to write a function that takes this array of Rectangles, and breaks it up into groups of connected rectangles. Lets say for example, that these are my rectangles (constructor takes the arguments x, y, width,height): Rectangle[] rects = new Rectangle[] { new Rectangle(0, 0, 4, 2), //A new Rectangle(1, 1, 2, 4), //B new Rectangle(0, 4, 8, 2), //C new Rectangle(6, 0, 2, 2) //D } A quick drawing shows that A intersects B and B intersects C. D intersects nothing. A tediously drawn piece of ascii art does the job too: +-------+ +---+ ¦A+---+ ¦ ¦ D ¦ +-+---+-+ +---+ ¦ B ¦ +-+---+---------+ ¦ +---+ C ¦ +---------------+ Therefore, the output of my function should be: new Rectangle[][]{ new Rectangle[] {A,B,C}, new Rectangle[] {D} } The failed code This was my attempt at solving the problem: public List<Rectangle> getIntersections(ArrayList<Rectangle> list, Rectangle r) { List<Rectangle> intersections = new ArrayList<Rectangle>(); for(Rectangle rect : list) { if(r.intersects(rect)) { list.remove(rect); intersections.add(rect); intersections.addAll(getIntersections(list, rect)); } } return intersections; } public List<List<Rectangle>> mergeIntersectingRects(Rectangle... rectArray) { List<Rectangle> allRects = new ArrayList<Rectangle>(rectArray); List<List<Rectangle>> groups = new ArrayList<ArrayList<Rectangle>>(); for(Rectangle rect : allRects) { allRects.remove(rect); ArrayList<Rectangle> group = getIntersections(allRects, rect); group.add(rect); groups.add(group); } return groups; } Unfortunately, there seems to be an infinite recursion loop going on here. My uneducated guess would be that java does not like me doing this: for(Rectangle rect : allRects) { allRects.remove(rect); //... } Can anyone shed some light on the issue?

    Read the article

  • How do I find all paths through a set of given nodes in a DAG?

    - by Hanno Fietz
    I have a list of items (blue nodes below) which are categorized by the users of my application. The categories themselves can be grouped and categorized themselves. The resulting structure can be represented as a Directed Acyclic Graph (DAG) where the items are sinks at the bottom of the graph's topology and the top categories are sources. Note that while some of the categories might be well defined, a lot is going to be user defined and might be very messy. Example: On that structure, I want to perform the following operations: find all items (sinks) below a particular node (all items in Europe) find all paths (if any) that pass through all of a set of n nodes (all items sent via SMTP from example.com) find all nodes that lie below all of a set of nodes (intersection: goyish brown foods) The first seems quite straightforward: start at the node, follow all possible paths to the bottom and collect the items there. However, is there a faster approach? Remembering the nodes I already passed through probably helps avoiding unnecessary repetition, but are there more optimizations? How do I go about the second one? It seems that the first step would be to determine the height of each node in the set, as to determine at which one(s) to start and then find all paths below that which include the rest of the set. But is this the best (or even a good) approach? The graph traversal algorithms listed at Wikipedia all seem to be concerned with either finding a particular node or the shortest or otherwise most effective route between two nodes. I think both is not what I want, or did I just fail to see how this applies to my problem? Where else should I read?

    Read the article

< Previous Page | 122 123 124 125 126 127 128 129 130 131 132 133  | Next Page >