o logn - Page 2 - Developer IT

nth smallest number among two databases of size n each using divide and conquer

- by urfriend

we have two databases of size n containing numbers without repeats. So, in total we have 2n elements. They can be accessed through a query to one database at a time. The query is such that you give it a k and it returns kth smallest entry in that database. we need to find nth smallest entry among all the 2n elements in O(logn) queries. the idea is to use divide and conquer but i need some help thinking through this. thanks!

Read the article

Space requirements of a merge-sort

- by Arkaitz Jimenez

I'm trying to understand the space requirements for a Mergesort, O(n). I see that time requirements are basically, amount of levels(logn) * merge(n) so that makes (n log n). Now, we are still allocating n per level, in 2 different arrays, left and right. I do understand that the key here is that when the recursive functions return the space gets deallocated, but I'm not seeing it too obvious. Besides, all the info I find, just states space required is O(n) but don't explain it. Any hint? function merge_sort(m) if length(m) = 1 return m var list left, right, result var integer middle = length(m) / 2 for each x in m up to middle add x to left for each x in m after middle add x to right left = merge_sort(left) right = merge_sort(right) result = merge(left, right) return result

Read the article

Balanced Search Tree Query, Asymtotic Analysis..

- by AGeek

Hi, The situation is as follows:- We have n number and we have print them in sorted order. We have access to balanced dictionary data structure, which supports the operations serach, insert, delete, minimum, maximum each in O(log n) time. We want to retrieve the numbers in sorted order in O(n log n) time using only the insert and in-order traversal. The answer to this is:- Sort() initialize(t) while(not EOF) read(x) insert(x,t); Traverse(t); Now the query is if we read the elements in time "n" and then traverse the elements in "log n"(in-order traversal) time,, then the total time for this algorithm (n+logn)time, according to me.. Please explain the follow up of this algorithm for the time calculation.. How it will sort the list in O(nlogn) time?? Thanks.

Read the article

How to find kth minimal element in the union of two sorted arrays?

- by Michael

This is a homework question. They say it takes O(logN + logM) where N and M are the arrays lengths. Let's name the arrays a and b. Obviously we can ignore all a[i] and b[i] where i k. First let's compare a[k/2] and b[k/2]. Let b[k/2] a[k/2]. Therefore we can discard also all b[i], where i k/2. Now we have all a[i], where i < k and all b[i], where i < k/2 to find the answer. What is the next step?

Read the article

Big-O of PHP functions?

- by Kendall Hopkins

After using PHP for a while now, I've noticed that not all PHP built in functions as fast as expected. Consider the below two possible implementations of a function that finds if a number is prime using a cached array of primes. //very slow for large $prime_array $prime_array = array( 2, 3, 5, 7, 11, 13, .... 104729, ... ); $result_array = array(); foreach( $array_of_number => $number ) { $result_array[$number] = in_array( $number, $large_prime_array ); } //still decent performance for large $prime_array $prime_array => array( 2 => NULL, 3 => NULL, 5 => NULL, 7 => NULL, 11 => NULL, 13 => NULL, .... 104729 => NULL, ... ); foreach( $array_of_number => $number ) { $result_array[$number] = array_key_exists( $number, $large_prime_array ); } This is because in_array is implemented with a linear search O(n) which will linearly slow down as $prime_array grows. Where the array_key_exists function is implemented with a hash lookup O(1) which will not slow down unless the hash table gets extremely populated (in which case it's only O(logn)). So far I've had to discover the big-O's via trial and error, and occasionally looking at the source code. Now for the question... I was wondering if there was a list of the theoretical (or practical) big O times for all* the PHP built in functions. *or at least the interesting ones For example find it very hard to predict what the big O of functions listed because the possible implementation depends on unknown core data structures of PHP: array_merge, array_merge_recursive, array_reverse, array_intersect, array_combine, str_replace (with array inputs), etc.

Read the article

Unexpected performance curve from CPython merge sort

- by vkazanov

I have implemented a naive merge sorting algorithm in Python. Algorithm and test code is below: import time import random import matplotlib.pyplot as plt import math from collections import deque def sort(unsorted): if len(unsorted) <= 1: return unsorted to_merge = deque(deque([elem]) for elem in unsorted) while len(to_merge) > 1: left = to_merge.popleft() right = to_merge.popleft() to_merge.append(merge(left, right)) return to_merge.pop() def merge(left, right): result = deque() while left or right: if left and right: elem = left.popleft() if left[0] > right[0] else right.popleft() elif not left and right: elem = right.popleft() elif not right and left: elem = left.popleft() result.append(elem) return result LOOP_COUNT = 100 START_N = 1 END_N = 1000 def test(fun, test_data): start = time.clock() for _ in xrange(LOOP_COUNT): fun(test_data) return time.clock() - start def run_test(): timings, elem_nums = [], [] test_data = random.sample(xrange(100000), END_N) for i in xrange(START_N, END_N): loop_test_data = test_data[:i] elapsed = test(sort, loop_test_data) timings.append(elapsed) elem_nums.append(len(loop_test_data)) print "%f s --- %d elems" % (elapsed, len(loop_test_data)) plt.plot(elem_nums, timings) plt.show() run_test() As much as I can see everything is OK and I should get a nice N*logN curve as a result. But the picture differs a bit: Things I've tried to investigate the issue: PyPy. The curve is ok. Disabled the GC using the gc module. Wrong guess. Debug output showed that it doesn't even run until the end of the test. Memory profiling using meliae - nothing special or suspicious. ` I had another implementation (a recursive one using the same merge function), it acts the similar way. The more full test cycles I create - the more "jumps" there are in the curve. So how can this behaviour be explained and - hopefully - fixed? UPD: changed lists to collections.deque UPD2: added the full test code UPD3: I use Python 2.7.1 on a Ubuntu 11.04 OS, using a quad-core 2Hz notebook. I tried to turn of most of all other processes: the number of spikes went down but at least one of them was still there.

Read the article

Designing small comparable objects

- by Thomas Ahle

Intro Consider you have a list of key/value pairs: (0,a) (1,b) (2,c) You have a function, that inserts a new value between two current pairs, and you need to give it a key that keeps the order: (0,a) (0.5,z) (1,b) (2,c) Here the new key was chosen as the average between the average of keys of the bounding pairs. The problem is, that you list may have milions of inserts. If these inserts are all put close to each other, you may end up with keys such to 2^(-1000000), which are not easily storagable in any standard nor special number class. The problem How can you design a system for generating keys that: Gives the correct result (larger/smaller than) when compared to all the rest of the keys. Takes up only O(logn) memory (where n is the number of items in the list). My tries First I tried different number classes. Like fractions and even polynomium, but I could always find examples where the key size would grow linear with the number of inserts. Then I thought about saving pointers to a number of other keys, and saving the lower/greater than relationship, but that would always require at least O(sqrt) memory and time for comparison. Extra info: Ideally the algorithm shouldn't break when pairs are deleted from the list.

Read the article

List of Big-O for PHP functions?

- by Kendall Hopkins

After using PHP for a while now, I've noticed that not all PHP built in functions as fast as expected. Consider the below two possible implementations of a function that finds if a number is prime using a cached array of primes. //very slow for large $prime_array $prime_array = array( 2, 3, 5, 7, 11, 13, .... 104729, ... ); $result_array = array(); foreach( $array_of_number => $number ) { $result_array[$number] = in_array( $number, $large_prime_array ); } //still decent performance for large $prime_array $prime_array => array( 2 => NULL, 3 => NULL, 5 => NULL, 7 => NULL, 11 => NULL, 13 => NULL, .... 104729 => NULL, ... ); foreach( $array_of_number => $number ) { $result_array[$number] = array_key_exists( $number, $large_prime_array ); } This is because in_array is implemented with a linear search O(n) which will linearly slow down as $prime_array grows. Where the array_key_exists function is implemented with a hash lookup O(1) which will not slow down unless the hash table gets extremely populated (in which case it's only O(logn)). So far I've had to discover the big-O's via trial and error, and occasionally looking at the source code. Now for the question... I was wondering if there was a list of the theoretical (or practical) big O times for all* the PHP built in functions. *or at least the interesting ones For example find it very hard to predict what the big O of functions listed because the possible implementation depends on unknown core data structures of PHP: array_merge, array_merge_recursive, array_reverse, array_intersect, array_combine, str_replace (with array inputs), etc.

Read the article

Heap Algorithmic Issue

- by OberynMarDELL

I am having this algorithmic problem that I want to discuss about. Its not about find a solution but about optimization in terms of runtime. So here it is: Suppose we have a race court of Length L and a total of N cars that participate on the race. The race rules are simple. Once a car overtakes an other car the second car is eliminated from the race. The race ends when no more overtakes are possible to happen. The tricky part is that the k'th car has a starting point x[k] and a velocity v[k]. The points are given in an ascending order, but the velocities may differ. What I've done so far: Given that a car can get overtaken only by its previous, I calculated the time that it takes for each car to reach its next one t = (x[i] - x[i+1])/(v[i] - v[i+1]) and I insert these times onto a min heap in O(n log n). So in theory I have to pop the first element in O(logn), find its previous, pop it as well , update its time and insert it in the heap once more, much like a priority queue. My main problem is how I can access specific points of a heap in O(log n) or faster in order to keep the complexity in O(n log n) levels. This program should be written on Haskell so I would like to keep things simple as far as possible EDIT: I Forgot to write the actual point of the race. The goal is to find the order in which cars exit the game

Read the article

Passing the CAML thru the EY of the NEEDL

- by PointsToShare

© 2011 By: Dov Trietsch. All rights reserved Passing the CAML thru the EY of the NEEDL Definitions: CAML (Collaborative Application Markup Language) is an XML based markup language used in Microsoft SharePoint technologies Anonymous: A camel is a horse designed by committee Dov Trietsch: A CAML is a HORS designed by Microsoft I was advised against putting any Camel and Sphinx rhymes in here. Look it up in Google! _____ Now that we have dispensed with the dromedary jokes (BTW, I have many more, but they are not fit to print!), here is an interesting problem and its solution. We have built a list where the title must be kept unique so I needed to verify the existence (or absence) of a list item with a particular title. Two methods came to mind: 1: Span the list until the title is found (result = found) or until the list ends (result = not found). This is an algorithm of complexity O(N) and for long lists it is a performance sucker. 2: Use a CAML query instead. Here, for short list we’ll encounter some overhead, but because the query results in an SQL query on the content database, it is of complexity O(LogN), which is significantly better and scales perfectly. Obviously I decided to go with the latter and this is where the CAML s--t hit the fan. A CAML query returns a SPListItemCollection and I simply checked its Count. If it was 0, the item did not already exist and it was safe to add a new item with the given title. Otherwise I cancelled the operation and warned the user. The trouble was that I always got a positive. Most of the time a false positive. The count was greater than 0 regardles of the title I checked (except when the list was empty, which happens only once). This was very disturbing indeed. To solve my immediate problem which was speedy delivery, I reverted to the “Span the list” approach, but the problem bugged me, so I wrote a little console app by which I tested and tweaked and tested, time and again, until I found the solution. Yes, one can pass the proverbial CAML thru the ey of the needle (e’s missing on purpose). So here are my conclusions: CAML that does not work: Note: QT is my quote: char QT = Convert.ToChar((int)34); string titleQuery = "<Query>><Where><Eq>"; titleQuery += "<FieldRef Name=" + QT + "Title" + QT + "/>"; titleQuery += "<Value Type=" + QT + "Text" + QT + ">" + uniqueID + "</Value></Eq></Where></Query>"; titleQuery += "<ViewFields><FieldRef Name=" + QT + "Title" + QT + "/></ViewFields>"; Why? Even though U2U generates it, the <Query> and </Query> tags do not belong in the query that you pass. Start your query with the <Where> clause. Also the <ViewFiels> clause does not belong. I used this clause to limit the returned collection to a single column, and I still wish to do it. I’ll show how this is done a bit later. When you use the <Query> </Query> tags in you query, it’s as if you did not specify the query at all. What you get is the all inclusive default query for the list. It returns evey column and every item. It is expensive for both server and network because it does all the extra processing and eats plenty of bandwidth. Now, here is the CAML that works string titleQuery = "<Where><Eq>"; titleQuery += "<FieldRef Name=" + QT + "Title" + QT + "/>"; titleQuery += "<Value Type=" + QT + "Text" + QT + ">" + uniqueID + "</Value></Eq></Where>"; You’ll also notice that inside the unusable <ViewFields> clause above, we have a <FieldRef> clause. This is what we pass to the SPQuery object. Here is how: SPQuery query = new SPQuery(); query.Query = titleQuery; query.ViewFields = "<FieldRef Name=" + QT + "Title" + QT + "/>"; query.RowLimit = 1; SPListItemCollection col = masterList.GetItems(query); Two thing to note: we enter the view fields into the SPQuery object and we also limited the number of rows that the query returns. The latter is not always done, but in an existence test, there is no point in returning hundreds of rows. The query will now return one item or none, which is all we need in order to verify the existence (or non-existence) of items. Limiting the number of columns and the number of rows is a great performance enhancer. That’s all folks!!

Developer IT