Search Results

Search found 20224 results on 809 pages for 'query optimization'.

Page 16/809 | < Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23 | Next Page >

Prevent full table scan for query with multiple where clauses

- by Dave Jarvis

A while ago I posted a message about optimizing a query in MySQL. I have since ported the data and query to PostgreSQL, but now PostgreSQL has the same problem. The solution in MySQL was to force the optimizer to not optimize using STRAIGHT_JOIN. PostgreSQL offers no such option. Here is the explain: Here is the query: SELECT avg(d.amount) AS amount, y.year FROM station s, station_district sd, year_ref y, month_ref m, daily d LEFT JOIN city c ON c.id = 10663 WHERE -- Find all the stations within a specific unit radius ... -- 6371.009 * SQRT( POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) + (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) * POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2)) ) <= 50 AND -- Ignore stations outside the given elevations -- s.elevation BETWEEN 0 AND 2000 AND sd.id = s.station_district_id AND -- Gather all known years for that station ... -- y.station_district_id = sd.id AND -- The data before 1900 is shaky; insufficient after 2009. -- y.year BETWEEN 1980 AND 2000 AND -- Filtered by all known months ... -- m.year_ref_id = y.id AND m.month = 12 AND -- Whittled down by category ... -- m.category_id = '001' AND -- Into the valid daily climate data. -- m.id = d.month_ref_id AND d.daily_flag_id <> 'M' GROUP BY y.year It appears as though PostgreSQL is looking at the DAILY table first, which is simply not the right way to go about this query as there are nearly 300 million rows. How do I force PostgreSQL to start at the CITY table? Thank you!

Read the article
Java code optimization on matrix windowing computes in more time

- by rano

I have a matrix which represents an image and I need to cycle over each pixel and for each one of those I have to compute the sum of all its neighbors, ie the pixels that belong to a window of radius rad centered on the pixel. I came up with three alternatives: The simplest way, the one that recomputes the window for each pixel The more optimized way that uses a queue to store the sums of the window columns and cycling through the columns of the matrix updates this queue by adding a new element and removing the oldes The even more optimized way that does not need to recompute the queue for each row but incrementally adjusts a previously saved one I implemented them in c++ using a queue for the second method and a combination of deques for the third (I need to iterate through their elements without destructing them) and scored their times to see if there was an actual improvement. it appears that the third method is indeed faster. Then I tried to port the code to Java (and I must admit that I'm not very comfortable with it). I used ArrayDeque for the second method and LinkedLists for the third resulting in the third being inefficient in time. Here is the simplest method in C++ (I'm not posting the java version since it is almost identical): void normalWindowing(int mat[][MAX], int cols, int rows, int rad){ int i, j; int h = 0; for (i = 0; i < rows; ++i) { for (j = 0; j < cols; j++) { h = 0; for (int ry =- rad; ry <= rad; ry++) { int y = i + ry; if (y >= 0 && y < rows) { for (int rx =- rad; rx <= rad; rx++) { int x = j + rx; if (x >= 0 && x < cols) { h += mat[y][x]; } } } } } } } Here is the second method (the one optimized through columns) in C++: void opt1Windowing(int mat[][MAX], int cols, int rows, int rad){ int i, j, h, y, col; queue<int>* q = NULL; for (i = 0; i < rows; ++i) { if (q != NULL) delete(q); q = new queue<int>(); h = 0; for (int rx = 0; rx <= rad; rx++) { if (rx < cols) { int mem = 0; for (int ry =- rad; ry <= rad; ry++) { y = i + ry; if (y >= 0 && y < rows) { mem += mat[y][rx]; } } q->push(mem); h += mem; } } for (j = 1; j < cols; j++) { col = j + rad; if (j - rad > 0) { h -= q->front(); q->pop(); } if (j + rad < cols) { int mem = 0; for (int ry =- rad; ry <= rad; ry++) { y = i + ry; if (y >= 0 && y < rows) { mem += mat[y][col]; } } q->push(mem); h += mem; } } } } And here is the Java version: public static void opt1Windowing(int [][] mat, int rad){ int i, j = 0, h, y, col; int cols = mat[0].length; int rows = mat.length; ArrayDeque<Integer> q = null; for (i = 0; i < rows; ++i) { q = new ArrayDeque<Integer>(); h = 0; for (int rx = 0; rx <= rad; rx++) { if (rx < cols) { int mem = 0; for (int ry =- rad; ry <= rad; ry++) { y = i + ry; if (y >= 0 && y < rows) { mem += mat[y][rx]; } } q.addLast(mem); h += mem; } } j = 0; for (j = 1; j < cols; j++) { col = j + rad; if (j - rad > 0) { h -= q.peekFirst(); q.pop(); } if (j + rad < cols) { int mem = 0; for (int ry =- rad; ry <= rad; ry++) { y = i + ry; if (y >= 0 && y < rows) { mem += mat[y][col]; } } q.addLast(mem); h += mem; } } } } I recognize this post will be a wall of text. Here is the third method in C++: void opt2Windowing(int mat[][MAX], int cols, int rows, int rad){ int i = 0; int j = 0; int h = 0; int hh = 0; deque< deque<int> *> * M = new deque< deque<int> *>(); for (int ry = 0; ry <= rad; ry++) { if (ry < rows) { deque<int> * q = new deque<int>(); M->push_back(q); for (int rx = 0; rx <= rad; rx++) { if (rx < cols) { int val = mat[ry][rx]; q->push_back(val); h += val; } } } } deque<int> * C = new deque<int>(M->front()->size()); deque<int> * Q = new deque<int>(M->front()->size()); deque<int> * R = new deque<int>(M->size()); deque< deque<int> *>::iterator mit; deque< deque<int> *>::iterator mstart = M->begin(); deque< deque<int> *>::iterator mend = M->end(); deque<int>::iterator rit; deque<int>::iterator rstart = R->begin(); deque<int>::iterator rend = R->end(); deque<int>::iterator cit; deque<int>::iterator cstart = C->begin(); deque<int>::iterator cend = C->end(); for (mit = mstart, rit = rstart; mit != mend, rit != rend; ++mit, ++rit) { deque<int>::iterator pit; deque<int>::iterator pstart = (* mit)->begin(); deque<int>::iterator pend = (* mit)->end(); for(cit = cstart, pit = pstart; cit != cend && pit != pend; ++cit, ++pit) { (* cit) += (* pit); (* rit) += (* pit); } } for (i = 0; i < rows; ++i) { j = 0; if (i - rad > 0) { deque<int>::iterator cit; deque<int>::iterator cstart = C->begin(); deque<int>::iterator cend = C->end(); deque<int>::iterator pit; deque<int>::iterator pstart = (M->front())->begin(); deque<int>::iterator pend = (M->front())->end(); for(cit = cstart, pit = pstart; cit != cend; ++cit, ++pit) { (* cit) -= (* pit); } deque<int> * k = M->front(); M->pop_front(); delete k; h -= R->front(); R->pop_front(); } int row = i + rad; if (row < rows && i > 0) { deque<int> * newQ = new deque<int>(); M->push_back(newQ); deque<int>::iterator cit; deque<int>::iterator cstart = C->begin(); deque<int>::iterator cend = C->end(); int rx; int tot = 0; for (rx = 0, cit = cstart; rx <= rad; rx++, ++cit) { if (rx < cols) { int val = mat[row][rx]; newQ->push_back(val); (* cit) += val; tot += val; } } R->push_back(tot); h += tot; } hh = h; copy(C->begin(), C->end(), Q->begin()); for (j = 1; j < cols; j++) { int col = j + rad; if (j - rad > 0) { hh -= Q->front(); Q->pop_front(); } if (j + rad < cols) { int val = 0; for (int ry =- rad; ry <= rad; ry++) { int y = i + ry; if (y >= 0 && y < rows) { val += mat[y][col]; } } hh += val; Q->push_back(val); } } } } And finally its Java version: public static void opt2Windowing(int [][] mat, int rad){ int cols = mat[0].length; int rows = mat.length; int i = 0; int j = 0; int h = 0; int hh = 0; LinkedList<LinkedList<Integer>> M = new LinkedList<LinkedList<Integer>>(); for (int ry = 0; ry <= rad; ry++) { if (ry < rows) { LinkedList<Integer> q = new LinkedList<Integer>(); M.addLast(q); for (int rx = 0; rx <= rad; rx++) { if (rx < cols) { int val = mat[ry][rx]; q.addLast(val); h += val; } } } } int firstSize = M.getFirst().size(); int mSize = M.size(); LinkedList<Integer> C = new LinkedList<Integer>(); LinkedList<Integer> Q = null; LinkedList<Integer> R = new LinkedList<Integer>(); for (int k = 0; k < firstSize; k++) { C.add(0); } for (int k = 0; k < mSize; k++) { R.add(0); } ListIterator<LinkedList<Integer>> mit; ListIterator<Integer> rit; ListIterator<Integer> cit; ListIterator<Integer> pit; for (mit = M.listIterator(), rit = R.listIterator(); mit.hasNext();) { Integer r = rit.next(); int rsum = 0; for (cit = C.listIterator(), pit = (mit.next()).listIterator(); cit.hasNext();) { Integer c = cit.next(); Integer p = pit.next(); rsum += p; cit.set(c + p); } rit.set(r + rsum); } for (i = 0; i < rows; ++i) { j = 0; if (i - rad > 0) { for(cit = C.listIterator(), pit = M.getFirst().listIterator(); cit.hasNext();) { Integer c = cit.next(); Integer p = pit.next(); cit.set(c - p); } M.removeFirst(); h -= R.getFirst(); R.removeFirst(); } int row = i + rad; if (row < rows && i > 0) { LinkedList<Integer> newQ = new LinkedList<Integer>(); M.addLast(newQ); int rx; int tot = 0; for (rx = 0, cit = C.listIterator(); rx <= rad; rx++) { if (rx < cols) { Integer c = cit.next(); int val = mat[row][rx]; newQ.addLast(val); cit.set(c + val); tot += val; } } R.addLast(tot); h += tot; } hh = h; Q = new LinkedList<Integer>(); Q.addAll(C); for (j = 1; j < cols; j++) { int col = j + rad; if (j - rad > 0) { hh -= Q.getFirst(); Q.pop(); } if (j + rad < cols) { int val = 0; for (int ry =- rad; ry <= rad; ry++) { int y = i + ry; if (y >= 0 && y < rows) { val += mat[y][col]; } } hh += val; Q.addLast(val); } } } } I guess that most is due to the poor choice of the LinkedList in Java and to the lack of an efficient (not shallow) copy method between two LinkedList. How can I improve the third Java method? Am I doing some conceptual error? As always, any criticisms is welcome. UPDATE Even if it does not solve the issue, using ArrayLists, as being suggested, instead of LinkedList improves the third method. The second one performs still better (but when the number of rows and columns of the matrix is lower than 300 and the window radius is small the first unoptimized method is the fastest in Java)

Read the article
Matlab: Optimization by perturbing variable

- by S_H

My main script contains following code: %# Grid and model parameters nModel=50; nModel_want=1; nI_grid1=5; Nth=1; nRow.Scale1=5; nCol.Scale1=5; nRow.Scale2=5^2; nCol.Scale2=5^2; theta = 90; % degrees a_minor = 2; % range along minor direction a_major = 5; % range along major direction sill = var(reshape(Deff_matrix_NthModel,nCell.Scale1,1)); % variance of the coarse data matrix of size nRow.Scale1 X nCol.Scale1 %# Covariance computation % Scale 1 for ihRow = 1:nRow.Scale1 for ihCol = 1:nCol.Scale1 [cov.Scale1(ihRow,ihCol),heff.Scale1(ihRow,ihCol)] = general_CovModel(theta, ihCol, ihRow, a_minor, a_major, sill, 'Exp'); end end % Scale 2 for ihRow = 1:nRow.Scale2 for ihCol = 1:nCol.Scale2 [cov.Scale2(ihRow,ihCol),heff.Scale2(ihRow,ihCol)] = general_CovModel(theta, ihCol/(nCol.Scale2/nCol.Scale1), ihRow/(nRow.Scale2/nRow.Scale1), a_minor, a_major, sill/(nRow.Scale2*nCol.Scale2), 'Exp'); end end %# Scale-up of fine scale values by averaging [covAvg.Scale2,var_covAvg.Scale2,varNorm_covAvg.Scale2] = general_AverageProperty(nRow.Scale2/nRow.Scale1,nCol.Scale2/nCol.Scale1,1,nRow.Scale1,nCol.Scale1,1,cov.Scale2,1); I am using two functions, general_CovModel() and general_AverageProperty(), in my main script which are given as following: function [cov,h_eff] = general_CovModel(theta, hx, hy, a_minor, a_major, sill, mod_type) % mod_type should be in strings angle_rad = theta*(pi/180); % theta in degrees, angle_rad in radians R_theta = [sin(angle_rad) cos(angle_rad); -cos(angle_rad) sin(angle_rad)]; h = [hx; hy]; lambda = a_minor/a_major; D_lambda = [lambda 0; 0 1]; h_2prime = D_lambda*R_theta*h; h_eff = sqrt((h_2prime(1)^2)+(h_2prime(2)^2)); if strcmp(mod_type,'Sph')==1 || strcmp(mod_type,'sph') ==1 if h_eff<=a cov = sill - sill.*(1.5*(h_eff/a_minor)-0.5*((h_eff/a_minor)^3)); else cov = sill; end elseif strcmp(mod_type,'Exp')==1 || strcmp(mod_type,'exp') ==1 cov = sill-(sill.*(1-exp(-(3*h_eff)/a_minor))); elseif strcmp(mod_type,'Gauss')==1 || strcmp(mod_type,'gauss') ==1 cov = sill-(sill.*(1-exp(-((3*h_eff)^2/(a_minor^2))))); end and function [PropertyAvg,variance_PropertyAvg,NormVariance_PropertyAvg]=... general_AverageProperty(blocksize_row,blocksize_col,blocksize_t,... nUpscaledRow,nUpscaledCol,nUpscaledT,PropertyArray,omega) % This function computes average of a property and variance of that averaged % property using power averaging PropertyAvg=zeros(nUpscaledRow,nUpscaledCol,nUpscaledT); %# Average of property for k=1:nUpscaledT, for j=1:nUpscaledCol, for i=1:nUpscaledRow, sum=0; for a=1:blocksize_row, for b=1:blocksize_col, for c=1:blocksize_t, sum=sum+(PropertyArray((i-1)*blocksize_row+a,(j-1)*blocksize_col+b,(k-1)*blocksize_t+c).^omega); % add all the property values in 'blocksize_x','blocksize_y','blocksize_t' to one variable end end end PropertyAvg(i,j,k)=(sum/(blocksize_row*blocksize_col*blocksize_t)).^(1/omega); % take average of the summed property end end end %# Variance of averageed property variance_PropertyAvg=var(reshape(PropertyAvg,... nUpscaledRow*nUpscaledCol*nUpscaledT,1),1,1); %# Normalized variance of averageed property NormVariance_PropertyAvg=variance_PropertyAvg./(var(reshape(... PropertyArray,numel(PropertyArray),1),1,1)); Question: Using Matlab, I would like to optimize covAvg.Scale2 such that it matches closely with cov.Scale1 by perturbing/varying any (or all) of the following variables 1) a_minor 2) a_major 3) theta Thanks.

Read the article
Project Euler: Programmatic Optimization for Problem 7?

- by bmucklow

So I would call myself a fairly novice programmer as I focused mostly on hardware in my schooling and not a lot of Computer Science courses. So I solved Problem 7 of Project Euler: By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13. What is the 10001st prime number? I managed to solve this without problem in Java, but when I ran my solution it took 8 and change seconds to run. I was wondering how this could be optimized from a programming standpoint, not a mathematical standpoint. Is the array looping and while statements the main things eating up processing time? And how could this be optimized? Again not looking for a fancy mathematical equation..there are plenty of those in the solution thread. SPOILER My solution is listed below. public class PrimeNumberList { private ArrayList<BigInteger> primesList = new ArrayList<BigInteger>(); public void fillList(int numberOfPrimes) { primesList.add(new BigInteger("2")); primesList.add(new BigInteger("3")); while (primesList.size() < numberOfPrimes){ getNextPrime(); } } private void getNextPrime() { BigInteger lastPrime = primesList.get(primesList.size()-1); BigInteger currentTestNumber = lastPrime; BigInteger modulusResult; boolean prime = false; while(!prime){ prime = true; currentTestNumber = currentTestNumber.add(new BigInteger("2")); for (BigInteger bi : primesList){ modulusResult = currentTestNumber.mod(bi); if (modulusResult.equals(BigInteger.ZERO)){ prime = false; break; } } if(prime){ primesList.add(currentTestNumber); } } } public BigInteger get(int primeTerm) { return primesList.get(primeTerm - 1); } }

Read the article
Compiler optimization of repeated accessor calls

- by apocalypse9

I've found recently that for some types of financial calculations that the following pattern is much easier to follow and test especially in situations where we may need to get numbers from various stages of the computation. public class nonsensical_calculator { ... double _rate; int _term; int _days; double monthlyRate { get { return _rate / 12; }} public double days { get { return (1 - i); }} double ar { get { return (1+ days) /(monthlyRate * days) double bleh { get { return Math.Pow(ar - days, _term) public double raar { get { return bleh * ar/2 * ar / days; }} .... } Obviously this often results in multiple calls to the same accessor within a given formula. I was curious as to whether or not the compiler is smart enough to optimize away these repeated calls with no intervening change in state, or whether this style is causing a decent performance hit. Further reading suggestions are always appreciated

Read the article
Retrieve 2 last posts for each category.

- by Savageman

Hello, Lets say I have 2 tables: blog_posts and categories. Each blog post belongs to only ONE category, so there is basically a foreign key between the 2 tables here. I would like to retrieve the 2 lasts posts from each category, is it possible to achieve this in a single request? GROUP BY would group everything and leave me with only one row in each category. But I want 2 of them. It would be easy to perform 1 + N query (N = number of category). First retrieve the categories. And then retrieve 2 posts from each category. I believe it would also be quite easy to perform M queries (M = number of posts I want from each category). First query selects the first post for each category (with a group by). Second query retrieves the second post for each category. etc. I'm just wondering if someone has a better solution for this. I don't really mind doing 1+N queries for that, but for curiosity and general SQL knowledge, it would be appreciated! Thanks in advance to whom can help me with this.

Read the article
Open space sitting optimization algorithm

- by Georgy Bolyuba

As a result of changes in the company, we have to rearrange our sitting plan: one room with 10 desks in it. Some desks are more popular than others for number of reasons. One solution would be to draw a desk number from a hat. We think there is a better way to do it. We have 10 desks and 10 people. Lets give every person in this contest 50 hypothetical tokens to bid on the desks. There is no limit of how much you bid on one desk, you can put all 50, which would be saying "I want to sit only here, period". You can also say "I do not care" by giving every desk 5 tokens. Important note: nobody knows what other people are doing. Everyone has to decide based only on his/her best interest (sounds familiar?) Now lets say we obtained these hypothetical results: # | Desk# >| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 1 | Alise | 30 | 2 | 2 | 1 | 0 | 0 | 0 | 15 | 0 | 0 | = 50 2 | Bob | 20 | 15 | 0 | 10 | 1 | 1 | 1 | 1 | 1 | 0 | = 50 ... 10 | Zed | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 5 | = 50 Now, what we need to find is that one (or more) configuration(s) that gives us maximum satisfaction (i.e. people get desks they wanted taking into account all the bids and maximizing on the total of the group. Naturally the assumption is the more one bade on the desk the more he/she wants it). Since there are only 10 people, I think we can brute force it looking into all possible configurations, but I was wondering it there is a better algorithm for solving this kind of problems?

Read the article
Python optimization

- by Rami Jarrar

Hi, I do like this: f = open('wl4.txt', 'w') hh = 0 ###################################### for n in range(1,5): for l in range(33,127): if n==1: b = chr(l) + '\n' f.write(b) hh += 1 elif n==2: for s0 in range(33, 127): b = chr(l) + chr(s0) + '\n' f.write(b) hh += 1 elif n==3: for s0 in range(33, 127): for s1 in range(33, 127): b = chr(l) + chr(s0) + chr(s1) + '\n' f.write(b) hh += 1 elif n==4: for s0 in range(33, 127): for s1 in range(33, 127): for s2 in range(33,127): b = chr(l) + chr(s0) + chr(s1) + chr(s2) + '\n' f.write(b) hh += 1 ###################################### print "We Made %d Words." %(hh) ###################################### f.close() So, is there any method to make it faster?

Read the article
Compilier optimization of repeated accessor calls C#

- by apocalypse9

I've found recently that for some types of financial calculations that the following pattern is much easier to follow and test especially in situations where we may need to get numbers from various stages of the computation. public class nonsensical_calculator { ... double _rate; int _term; int _days; double monthlyRate { get { return _rate / 12; }} public double days { get { return (1 - i); }} double ar { get { return (1+ days) /(monthlyRate * days) double bleh { get { return Math.Pow(ar - days, _term) public double raar { get { return bleh * ar/2 * ar / days; }} .... } Obviously this often results in multiple calls to the same accessor within a given formula. I was curious as to whether or not the compiler is smart enough to optimize away these repeated calls with no intervening change in state, or whether this style is causing a decent performance hit. Further reading suggestions are always appreciated

Read the article
Python: Memory usage and optimization when modifying lists

- by xApple

The problem My concern is the following: I am storing a relativity large dataset in a classical python list and in order to process the data I must iterate over the list several times, perform some operations on the elements, and often pop an item out of the list. It seems that deleting one item out of a Python list costs O(N) since Python has to copy all the items above the element at hand down one place. Furthermore, since the number of items to delete is approximately proportional to the number of elements in the list this results in an O(N^2) algorithm. I am hoping to find a solution that is cost effective (time and memory-wise). I have studied what I could find on the internet and have summarized my different options below. Which one is the best candidate ? Keeping a local index: while processingdata: index = 0 while index < len(somelist): item = somelist[index] dosomestuff(item) if somecondition(item): del somelist[index] else: index += 1 This is the original solution I came up with. Not only is this not very elegant, but I am hoping there is better way to do it that remains time and memory efficient. Walking the list backwards: while processingdata: for i in xrange(len(somelist) - 1, -1, -1): dosomestuff(item) if somecondition(somelist, i): somelist.pop(i) This avoids incrementing an index variable but ultimately has the same cost as the original version. It also breaks the logic of dosomestuff(item) that wishes to process them in the same order as they appear in the original list. Making a new list: while processingdata: for i, item in enumerate(somelist): dosomestuff(item) newlist = [] for item in somelist: if somecondition(item): newlist.append(item) somelist = newlist gc.collect() This is a very naive strategy for eliminating elements from a list and requires lots of memory since an almost full copy of the list must be made. Using list comprehensions: while processingdata: for i, item in enumerate(somelist): dosomestuff(item) somelist[:] = [x for x in somelist if somecondition(x)] This is very elegant but under-the-cover it walks the whole list one more time and must copy most of the elements in it. My intuition is that this operation probably costs more than the original del statement at least memory wise. Keep in mind that somelist can be huge and that any solution that will iterate through it only once per run will probably always win. Using the filter function: while processingdata: for i, item in enumerate(somelist): dosomestuff(item) somelist = filter(lambda x: not subtle_condition(x), somelist) This also creates a new list occupying lots of RAM. Using the itertools' filter function: from itertools import ifilterfalse while processingdata: for item in itertools.ifilterfalse(somecondtion, somelist): dosomestuff(item) This version of the filter call does not create a new list but will not call dosomestuff on every item breaking the logic of the algorithm. I am including this example only for the purpose of creating an exhaustive list. Moving items up the list while walking while processingdata: index = 0 for item in somelist: dosomestuff(item) if not somecondition(item): somelist[index] = item index += 1 del somelist[index:] This is a subtle method that seems cost effective. I think it will move each item (or the pointer to each item ?) exactly once resulting in an O(N) algorithm. Finally, I hope Python will be intelligent enough to resize the list at the end without allocating memory for a new copy of the list. Not sure though. Abandoning Python lists: class Doubly_Linked_List: def __init__(self): self.first = None self.last = None self.n = 0 def __len__(self): return self.n def __iter__(self): return DLLIter(self) def iterator(self): return self.__iter__() def append(self, x): x = DLLElement(x) x.next = None if self.last is None: x.prev = None self.last = x self.first = x self.n = 1 else: x.prev = self.last x.prev.next = x self.last = x self.n += 1 class DLLElement: def __init__(self, x): self.next = None self.data = x self.prev = None class DLLIter: etc... This type of object resembles a python list in a limited way. However, deletion of an element is guaranteed O(1). I would not like to go here since this would require massive amounts of code refactoring almost everywhere.

Read the article
Solver Foundation Optimization - 1D Bin Packing

- by Val Nolav

I want to optimize loading marbles into trucks. I do not know, if I can use Solver Foundation class for that purpose. Before, I start writing code, I wanted to ask it here. 1- Marbles can be in any weight between 1 to 24 Tons. 2 - A truck can hold maximum of 24 Tons. 3- It can be loaded as many marble cubes, as it can take for upto 24 tones, which means there is no Volume limitation. 4- There can be between 200 up to 500 different marbles depending on time. GOAL - The goal is to load marbles in minimum truck shipment. How can I do that without writing a lot of if conditions and for loops? Can I use Microsoft Solver Foundation for that purpose? I read the documentation provided by Microsoft however, I could not find a scenario similar to mine. M1+ M2 + M3 + .... Mn <=24 this is for one truck shipment. Let say there are 200 different Marbles and Marble weights are Float. Thanks

Read the article
optimization mvc code

- by user276640

i have such code var prj = _dataContext.Project.FirstOrDefault(p => p.isPopular == true); if (prj != null) { prj.isPopular = false; _dataContext.SaveChanges(); } prj = Details(id); prj.isPopular = true; _dataContext.SaveChanges(); idea-i have only one record with value true in field isPopular, so i get it and make false, then i get object by id and make it isPopular true. i don't like 2 calls on savechanges. any ideas?

Read the article
Mysql InnoDB performance optimization and indexing

- by Davide C

Hello everybody, I have 2 databases and I need to link information between two big tables (more than 3M entries each, continuously growing). The 1st database has a table 'pages' that stores various information about web pages, and includes the URL of each one. The column 'URL' is a varchar(512) and has no index. The 2nd database has a table 'urlHops' defined as: CREATE TABLE urlHops ( dest varchar(512) NOT NULL, src varchar(512) DEFAULT NULL, timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, KEY dest_key (dest), KEY src_key (src) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 Now, I need basically to issue (efficiently) queries like this: select p.id,p.URL from db1.pages p, db2.urlHops u where u.src=p.URL and u.dest=? At first, I thought to add an index on pages(URL). But it's a very long column, and I already issue a lot of INSERTs and UPDATEs on the same table (way more than the number of SELECTs I would do using this index). Other possible solutions I thought are: -adding a column to pages, storing the md5 hash of the URL and indexing it; this way I could do queries using the md5 of the URL, with the advantage of an index on a smaller column. -adding another table that contains only page id and page URL, indexing both columns. But this is maybe a waste of space, having only the advantage of not slowing down the inserts and updates I execute on 'pages'. I don't want to slow down the inserts and updates, but at the same time I would be able to do the queries on the URL efficiently. Any advice? My primary concern is performance; if needed, wasting some disk space is not a problem. Thank you, regards Davide

Read the article
code optimization; switch versus if's

- by KaiserJohaan

Hello, I have a question about whether to use 'case' or 'ifs' in a function that gets called quite alot. Here's the following as it is now, in 'ifs'; the code is self-explanatory: int identifyMsg(char* textbuff) { if (!strcmp(textbuff,"text")) { return 1; } if (!strcmp(textbuff,"name")) { return 2; } if (!strcmp(textbuff,"list")) { return 3; } if (!strcmp(textbuff,"remv")) { return 4; } if (!strcmp(textbuff,"ipad")) { return 5; } if (!strcmp(textbuff,"iprm")) { return 6; } return 0; } My question is: Would a switch perform better? I know if using ifs, I can place the most likely options at the top.

Read the article
(x86) Assembler Optimization

- by Pindatjuh

I'm building a compiler/assembler/linker in Java for the x86-32 (IA32) processor targeting Windows. High-level concepts of a "language" (in essential a Java API for creating executables) are translated into opcodes, which then are wrapped and outputted to a file. The translation process has several phases, one is the translation between languages: the highest-level code is translated into the medium-level code which is then translated into the lowest-level code (probably more than 3 levels). My problem is the following; if I have higher-level code (X and Y) translated to lower-level code (x, y, U and V), then an example of such a translation is, in pseudo-code: x + U(f) // generated by X + V(f) + y // generated by Y (An easy example) where V is the opposite of U (compare with a stack push as U and a pop as V). This needs to be 'optimized' into: x + y (essentially removing the "useless" code) My idea was to use regular expressions. For the above case, it'll be a regular expression looking like this: x:(U(x)+V(x)):null, meaning for all x find U(x) followed by V(x) and replace by null. Imagine more complex regular expressions, for more complex optimizations. This should work on all levels. What do you suggest? What would be a good approach to optimize in these situations?

Read the article
C# 'is' type check on struct - odd .NET 4.0 x86 optimization behavior

- by Jacob Stanley

Since upgrading to VS2010 I'm getting some very strange behavior with the 'is' keyword. The program below (test.cs) outputs True when compiled in debug mode (for x86) and False when compiled with optimizations on (for x86). Compiling all combinations in x64 or AnyCPU gives the expected result, True. All combinations of compiling under .NET 3.5 give the expected result, True. I'm using the batch file below (runtest.bat) to compile and test the code using various combinations of compiler .NET framework. Has anyone else seen these kind of problems under .NET 4.0? Does everyone else see the same behavior as me on their computer when running runtests.bat? #@$@#$?? Is there a fix for this? test.cs using System; public class Program { public static bool IsGuid(object item) { return item is Guid; } public static void Main() { Console.Write(IsGuid(Guid.NewGuid())); } } runtest.bat @echo off rem Usage: rem runtest -- runs with csc.exe x86 .NET 4.0 rem runtest 64 -- runs with csc.exe x64 .NET 4.0 rem runtest v3.5 -- runs with csc.exe x86 .NET 3.5 rem runtest v3.5 64 -- runs with csc.exe x64 .NET 3.5 set version=v4.0.30319 set platform=Framework for %%a in (%*) do ( if "%%a" == "64" (set platform=Framework64) if "%%a" == "v3.5" (set version=v3.5) ) echo Compiler: %platform%\%version%\csc.exe set csc="C:\Windows\Microsoft.NET\%platform%\%version%\csc.exe" set make=%csc% /nologo /nowarn:1607 test.cs rem CS1607: Referenced assembly targets a different processor rem This happens if you compile for x64 using csc32, or x86 using csc64 %make% /platform:x86 test.exe echo =^> x86 %make% /platform:x86 /optimize test.exe echo =^> x86 (Optimized) %make% /platform:x86 /debug test.exe echo =^> x86 (Debug) %make% /platform:x86 /debug /optimize test.exe echo =^> x86 (Debug + Optimized) %make% /platform:x64 test.exe echo =^> x64 %make% /platform:x64 /optimize test.exe echo =^> x64 (Optimized) %make% /platform:x64 /debug test.exe echo =^> x64 (Debug) %make% /platform:x64 /debug /optimize test.exe echo =^> x64 (Debug + Optimized) %make% /platform:AnyCPU test.exe echo =^> AnyCPU %make% /platform:AnyCPU /optimize test.exe echo =^> AnyCPU (Optimized) %make% /platform:AnyCPU /debug test.exe echo =^> AnyCPU (Debug) %make% /platform:AnyCPU /debug /optimize test.exe echo =^> AnyCPU (Debug + Optimized) Test Results When running the runtest.bat I get the following results on my Win7 x64 install. > runtest 32 v4.0 Compiler: Framework\v4.0.30319\csc.exe False => x86 False => x86 (Optimized) True => x86 (Debug) False => x86 (Debug + Optimized) True => x64 True => x64 (Optimized) True => x64 (Debug) True => x64 (Debug + Optimized) True => AnyCPU True => AnyCPU (Optimized) True => AnyCPU (Debug) True => AnyCPU (Debug + Optimized) > runtest 64 v4.0 Compiler: Framework64\v4.0.30319\csc.exe False => x86 False => x86 (Optimized) True => x86 (Debug) False => x86 (Debug + Optimized) True => x64 True => x64 (Optimized) True => x64 (Debug) True => x64 (Debug + Optimized) True => AnyCPU True => AnyCPU (Optimized) True => AnyCPU (Debug) True => AnyCPU (Debug + Optimized) > runtest 32 v3.5 Compiler: Framework\v3.5\csc.exe True => x86 True => x86 (Optimized) True => x86 (Debug) True => x86 (Debug + Optimized) True => x64 True => x64 (Optimized) True => x64 (Debug) True => x64 (Debug + Optimized) True => AnyCPU True => AnyCPU (Optimized) True => AnyCPU (Debug) True => AnyCPU (Debug + Optimized) > runtest 64 v3.5 Compiler: Framework64\v3.5\csc.exe True => x86 True => x86 (Optimized) True => x86 (Debug) True => x86 (Debug + Optimized) True => x64 True => x64 (Optimized) True => x64 (Debug) True => x64 (Debug + Optimized) True => AnyCPU True => AnyCPU (Optimized) True => AnyCPU (Debug) True => AnyCPU (Debug + Optimized) tl;dr

Read the article
Java variables -> replace? RAM optimization

- by poeschlorn

Hi guys, I just wanted to know what happens behind my program when I declare and initialize a variable and later initialize it again with other values, e.g. an ArrayList or something similar. What happens in my RAM, when I say e.g. this: ArrayList<String> al = new ArrayList<String>(); ...add values, work with it and so on.... al = new ArrayList<String>(); So is my first ArrayList held in RAM or will the second ArrayList be stored on the same position where the first one has been before? Or will it just change the reference of "al"? If it is not replaced...is there a way to manually free the RAM which was occupied by the first arraylist? (without waiting for the garbage collector) Would it help to set it first =null? Nice greetings, poeschlorn

Read the article
Haskell optimization of a function looking for a bytestring terminator

- by me2

Profiling of some code showed that about 65% of the time I was inside the following code. What it does is use the Data.Binary.Get monad to walk through a bytestring looking for the terminator. If it detects 0xff, it checks if the next byte is 0x00. If it is, it drops the 0x00 and continues. If it is not 0x00, then it drops both bytes and the resulting list of bytes is converted to a bytestring and returned. Any obvious ways to optimize this? I can't see it. parseECS = f [] False where f acc ff = do b <- getWord8 if ff then if b == 0x00 then f (0xff:acc) False else return $ L.pack (reverse acc) else if b == 0xff then f acc True else f (b:acc) False

Read the article
compile time if && return string reference optimization

- by Truncheon

Hi. I'm writing a series classes that inherit from a base class using virtual. They are INT, FLOAT and STRING objects that I want to use in a scripting language. I'm trying to implement weak typing, but I don't want STRING objects to return copies of themselves when used in the following way (instead I would prefer to have a reference returned which can be used in copying): a = "hello "; b = "world"; c = a + b; I have written the following code as a mock example: #include <iostream> #include <string> #include <cstdio> #include <cstdlib> std::string dummy("<int object cannot return string reference>"); struct BaseImpl { virtual bool is_string() = 0; virtual int get_int() = 0; virtual std::string get_string_copy() = 0; virtual std::string const& get_string_ref() = 0; }; struct INT : BaseImpl { int value; INT(int i = 0) : value(i) { std::cout << "constructor called\n"; } INT(BaseImpl& that) : value(that.get_int()) { std::cout << "copy constructor called\n"; } bool is_string() { return false; } int get_int() { return value; } std::string get_string_copy() { char buf[33]; sprintf(buf, "%i", value); return buf; } std::string const& get_string_ref() { return dummy; } }; struct STRING : BaseImpl { std::string value; STRING(std::string s = "") : value(s) { std::cout << "constructor called\n"; } STRING(BaseImpl& that) { if (that.is_string()) value = that.get_string_ref(); else value = that.get_string_copy(); std::cout << "copy constructor called\n"; } bool is_string() { return true; } int get_int() { return atoi(value.c_str()); } std::string get_string_copy() { return value; } std::string const& get_string_ref() { return value; } }; struct Base { BaseImpl* impl; Base(BaseImpl* p = 0) : impl(p) {} ~Base() { delete impl; } }; int main() { Base b1(new INT(1)); Base b2(new STRING("Hello world")); Base b3(new INT(*b1.impl)); Base b4(new STRING(*b2.impl)); std::cout << "\n"; std::cout << b1.impl->get_int() << "\n"; std::cout << b2.impl->get_int() << "\n"; std::cout << b3.impl->get_int() << "\n"; std::cout << b4.impl->get_int() << "\n"; std::cout << "\n"; std::cout << b1.impl->get_string_ref() << "\n"; std::cout << b2.impl->get_string_ref() << "\n"; std::cout << b3.impl->get_string_ref() << "\n"; std::cout << b4.impl->get_string_ref() << "\n"; std::cout << "\n"; std::cout << b1.impl->get_string_copy() << "\n"; std::cout << b2.impl->get_string_copy() << "\n"; std::cout << b3.impl->get_string_copy() << "\n"; std::cout << b4.impl->get_string_copy() << "\n"; return 0; } It was necessary to add an if check in the STRING class to determine whether its safe to request a reference instead of a copy: Script code: a = "test"; b = a; c = 1; d = "" + c; /* not safe to request reference by standard */ C++ code: STRING(BaseImpl& that) { if (that.is_string()) value = that.get_string_ref(); else value = that.get_string_copy(); std::cout << "copy constructor called\n"; } If was hoping there's a way of moving that if check into compile time, rather than run time.

Read the article
Code optimization - Unused methods

- by Yochai Timmer

How can I tell if a method will never be used ? I know that for dll files and libraries you can't really know if someone else (another project) will ever use the code. In general I assume that anything public might be used somewhere else. But what about private methods ? Is it safe to assume that if I don't see an explicit call to that method, it won't be used ? I assume that for private methods it's easier to decide. But is it safe to decide it ONLY for private methods ?

Read the article
Python optimization problem?

- by user342079

Alright, i had this homework recently (don't worry, i've already done it, but in c++) but I got curious how i could do it in python. The problem is about 2 light sources that emit light. I won't get into details tho. Here's the code (that I've managed to optimize a bit in the latter part): import math, array import numpy as np from PIL import Image size = (800,800) width, height = size s1x = width * 1./8 s1y = height * 1./8 s2x = width * 7./8 s2y = height * 7./8 r,g,b = (255,255,255) arr = np.zeros((width,height,3)) hy = math.hypot print 'computing distances (%s by %s)'%size, for i in xrange(width): if i%(width/10)==0: print i, if i%20==0: print '.', for j in xrange(height): d1 = hy(i-s1x,j-s1y) d2 = hy(i-s2x,j-s2y) arr[i][j] = abs(d1-d2) print '' arr2 = np.zeros((width,height,3),dtype="uint8") for ld in [200,116,100,84,68,52,36,20,8,4,2]: print 'now computing image for ld = '+str(ld) arr2 *= 0 arr2 += abs(arr%ld-ld/2)*(r,g,b)/(ld/2) print 'saving image...' ar2img = Image.fromarray(arr2) ar2img.save('ld'+str(ld).rjust(4,'0')+'.png') print 'saved as ld'+str(ld).rjust(4,'0')+'.png' I have managed to optimize most of it, but there's still a huge performance gap in the part with the 2 for-s, and I can't seem to think of a way to bypass that using common array operations... I'm open to suggestions :D

Read the article
Optimization of Function with Dictionary and Zip()

- by eWizardII

Hello, I have the following function: def filetxt(): word_freq = {} lvl1 = [] lvl2 = [] total_t = 0 users = 0 text = [] for l in range(0,500): # Open File if os.path.exists("C:/Twitter/json/user_" + str(l) + ".json") == True: with open("C:/Twitter/json/user_" + str(l) + ".json", "r") as f: text_f = json.load(f) users = users + 1 for i in range(len(text_f)): text.append(text_f[str(i)]['text']) total_t = total_t + 1 else: pass # Filter occ = 0 import string for i in range(len(text)): s = text[i] # Sample string a = re.findall(r'(RT)',s) b = re.findall(r'(@)',s) occ = len(a) + len(b) + occ s = s.encode('utf-8') out = s.translate(string.maketrans("",""), string.punctuation) # Create Wordlist/Dictionary word_list = text[i].lower().split(None) for word in word_list: word_freq[word] = word_freq.get(word, 0) + 1 keys = word_freq.keys() numbo = range(1,len(keys)+1) WList = ', '.join(keys) NList = str(numbo).strip('[]') WList = WList.split(", ") NList = NList.split(", ") W2N = dict(zip(WList, NList)) for k in range (0,len(word_list)): word_list[k] = W2N[word_list[k]] for i in range (0,len(word_list)-1): lvl1.append(word_list[i]) lvl2.append(word_list[i+1]) I have used the profiler to find that it seems the greatest CPU time is spent on the zip() function and the join and split parts of the code, I'm looking to see if there is any way I have overlooked that I could potentially clean up the code to make it more optimized, since the greatest lag seems to be in how I am working with the dictionaries and the zip() function. Any help would be appreciated thanks!

Read the article
MySQL Optimization 20 gig table

- by user169743

I have a 20 gig table that has a large amount of inserts and updates daily. This table is also frequently searched. I'd like to know if the MySQL indices can become fragmented and perhaps need to be rebuilt or something similar. I'm finding it difficult to figure out which of the CHECK TABLE, REPAIR TABLE or something similar? Any guidance appreciated, I'm a db newb.

Read the article
Odd optimization problem under MSVC

- by Goz

I've seen this blog: http://igoro.com/archive/gallery-of-processor-cache-effects/ The "weirdness" in part 7 is what caught my interest. My first thought was "Thats just C# being weird". Its not I wrote the following C++ code. volatile int* p = (volatile int*)_aligned_malloc( sizeof( int ) * 8, 64 ); memset( (void*)p, 0, sizeof( int ) * 8 ); double dStart = t.GetTime(); for (int i = 0; i < 200000000; i++) { //p[0]++;p[1]++;p[2]++;p[3]++; // Option 1 //p[0]++;p[2]++;p[4]++;p[6]++; // Option 2 p[0]++;p[2]++; // Option 3 } double dTime = t.GetTime() - dStart; The timing I get on my 2.4 Ghz Core 2 Quad go as follows: Option 1 = ~8 cycles per loop. Option 2 = ~4 cycles per loop. Option 3 = ~6 cycles per loop. Now This is confusing. My reasoning behind the difference comes down to the cache write latency (3 cycles) on my chip and an assumption that the cache has a 128-bit write port (This is pure guess work on my part). On that basis in Option 1: It will increment p[0] (1 cycle) then increment p[2] (1 cycle) then it has to wait 1 cycle (for cache) then p[1] (1 cycle) then wait 1 cycle (for cache) then p[3] (1 cycle). Finally 2 cycles for increment and jump (Though its usually implemented as decrement and jump). This gives a total of 8 cycles. In Option 2: It can increment p[0] and p[4] in one cycle then increment p[2] and p[6] in another cycle. Then 2 cycles for subtract and jump. No waits needed on cache. Total 4 cycles. In option 3: It can increment p[0] then has to wait 2 cycles then increment p[2] then subtract and jump. The problem is if you set case 3 to increment p[0] and p[4] it STILL takes 6 cycles (which kinda blows my 128-bit read/write port out of the water). So ... can anyone tell me what the hell is going on here? Why DOES case 3 take longer? Also I'd love to know what I've got wrong in my thinking above, as i obviously have something wrong! Any ideas would be much appreciated! :) It'd also be interesting to see how GCC or any other compiler copes with it as well! Edit: Jerry Coffin's idea gave me some thoughts. I've done some more tests (on a different machine so forgive the change in timings) with and without nops and with different counts of nops case 2 - 0.46 00401ABD jne (401AB0h) 0 nops - 0.68 00401AB7 jne (401AB0h) 1 nop - 0.61 00401AB8 jne (401AB0h) 2 nops - 0.636 00401AB9 jne (401AB0h) 3 nops - 0.632 00401ABA jne (401AB0h) 4 nops - 0.66 00401ABB jne (401AB0h) 5 nops - 0.52 00401ABC jne (401AB0h) 6 nops - 0.46 00401ABD jne (401AB0h) 7 nops - 0.46 00401ABE jne (401AB0h) 8 nops - 0.46 00401ABF jne (401AB0h) 9 nops - 0.55 00401AC0 jne (401AB0h) I've included the jump statetements so you can see that the source and destination are in one cache line. You can also see that we start to get a difference when we are 13 bytes or more apart. Until we hit 16 ... then it all goes wrong. So Jerry isn't right (though his suggestion DOES help a bit), however something IS going on. I'm more and more intrigued to try and figure out what it is now. It does appear to be more some sort of memory alignment oddity rather than some sort of instruction throughput oddity. Anyone want to explain this for an inquisitive mind? :D Edit 3: Interjay has a point on the unrolling that blows the previous edit out of the water. With an unrolled loop the performance does not improve. You need to add a nop in to make the gap between jump source and destination the same as for my good nop count above. Performance still sucks. Its interesting that I need 6 nops to improve performance though. I wonder how many nops the processor can issue per cycle? If its 3 then that account for the cache write latency ... But, if thats it, why is the latency occurring? Curiouser and curiouser ...

Read the article
Haskell optimization of the following function

- by me2

Profiling of some code of mine showed that about 65% of the time I was running the following code. What it does is use the Data.Binary.Get monad to walk through a bytestring looking for the terminator. If it detects 0xff, it checks if the next byte is 0x00. If it is, it drops the 0x00 and continues. If it is not 0x00, then it drops both bytes and the resulting list of bytes is converted to a bytestring and returned. Any obvious ways to optimize this code? I can't see it. parseECS = f [] False where f acc ff = do b <- getWord8 if ff then if b == 0x00 then f (0xff:acc) False else return $ L.pack (reverse acc) else if b == 0xff then f acc True else f (b:acc) False

Read the article

< Previous Page | 12 13 14 15 16 17 18 19 20 21 22 23 | Next Page >