code optimization - Page 17

C++ iterators & loop optimization

- by Quantum7

I see a lot of c++ code that looks like this: for( const_iterator it = list.begin(), const_iterator ite = list.end(); it != ite; ++it) As opposed to the more concise version: for( const_iterator it = list.begin(); it != list.end(); ++it) Will there be any difference in speed between these two conventions? Naively the first will be slightly faster since list.end() is only called once. But since the iterator is const, it seems like the compiler will pull this test out of the loop, generating equivalent assembly for both.

Read the article

Constant embedded for loop condition optimization in C++ with gcc

- by solinent

Will a compiler optimize tihs: bool someCondition = someVeryTimeConsumingTask(/* ... */); for (int i=0; i<HUGE_INNER_LOOP; ++i) { if (someCondition) doCondition(i); else bacon(i); } into: bool someCondition = someVeryTimeConsumingTask(/* ... */); if (someCondition) for (int i=0; i<HUGE_INNER_LOOP; ++i) doCondition(i); else for (int i=0; i<HUGE_INNER_LOOP; ++i) bacon(i); someCondition is trivially constant within the for loop. This may seem obvious and that I should do this myself, but if you have more than one condition then you are dealing with permuatations of for loops, so the code would get quite a bit longer. I am deciding on whether to do it (I am already optimizing) or whether it will be a waste of my time.

Read the article

Pluggable Rules for Entity Framework Code First

- by Ricardo Peres

Suppose you want a system that lets you plug custom validation rules on your Entity Framework context. The rules would control whether an entity can be saved, updated or deleted, and would be implemented in plain .NET. Yes, I know I already talked about plugable validation in Entity Framework Code First, but this is a different approach. An example API is in order, first, a ruleset, which will hold the collection of rules: 1: public interface IRuleset : IDisposable 2: { 3: void AddRule<T>(IRule<T> rule); 4: IEnumerable<IRule<T>> GetRules<T>(); 5: } Next, a rule: 1: public interface IRule<T> 2: { 3: Boolean CanSave(T entity, DbContext ctx); 4: Boolean CanUpdate(T entity, DbContext ctx); 5: Boolean CanDelete(T entity, DbContext ctx); 6: String Name 7: { 8: get; 9: } 10: } Let’s analyze what we have, starting with the ruleset: Only has methods for adding a rule, specific to an entity type, and to list all rules of this entity type; By implementing IDisposable, we allow it to be cancelled, by disposing of it when we no longer want its rules to be applied. A rule, on the other hand: Has discrete methods for checking if a given entity can be saved, updated or deleted, which receive as parameters the entity itself and a pointer to the DbContext to which the ruleset was applied; Has a name property for helping us identifying what failed. A ruleset really doesn’t need a public implementation, all we need is its interface. The private (internal) implementation might look like this: 1: sealed class Ruleset : IRuleset 2: { 3: private readonly IDictionary<Type, HashSet<Object>> rules = new Dictionary<Type, HashSet<Object>>(); 4: private ObjectContext octx = null; 5: 6: internal Ruleset(ObjectContext octx) 7: { 8: this.octx = octx; 9: } 10: 11: public void AddRule<T>(IRule<T> rule) 12: { 13: if (this.rules.ContainsKey(typeof(T)) == false) 14: { 15: this.rules[typeof(T)] = new HashSet<Object>(); 16: } 17: 18: this.rules[typeof(T)].Add(rule); 19: } 20: 21: public IEnumerable<IRule<T>> GetRules<T>() 22: { 23: if (this.rules.ContainsKey(typeof(T)) == true) 24: { 25: foreach (IRule<T> rule in this.rules[typeof(T)]) 26: { 27: yield return (rule); 28: } 29: } 30: } 31: 32: public void Dispose() 33: { 34: this.octx.SavingChanges -= RulesExtensions.OnSaving; 35: RulesExtensions.rulesets.Remove(this.octx); 36: this.octx = null; 37: 38: this.rules.Clear(); 39: } 40: } Basically, this implementation: Stores the ObjectContext of the DbContext to which it was created for, this is so that later we can remove the association; Has a collection - a set, actually, which does not allow duplication - of rules indexed by the real Type of an entity (because of proxying, an entity may be of a type that inherits from the class that we declared); Has generic methods for adding and enumerating rules of a given type; Has a Dispose method for cancelling the enforcement of the rules. A (really dumb) rule applied to Product might look like this: 1: class ProductRule : IRule<Product> 2: { 3: #region IRule<Product> Members 4: 5: public String Name 6: { 7: get 8: { 9: return ("Rule 1"); 10: } 11: } 12: 13: public Boolean CanSave(Product entity, DbContext ctx) 14: { 15: return (entity.Price > 10000); 16: } 17: 18: public Boolean CanUpdate(Product entity, DbContext ctx) 19: { 20: return (true); 21: } 22: 23: public Boolean CanDelete(Product entity, DbContext ctx) 24: { 25: return (true); 26: } 27: 28: #endregion 29: } The DbContext is there because we may need to check something else in the database before deciding whether to allow an operation or not. And here’s how to apply this mechanism to any DbContext, without requiring the usage of a subclass, by means of an extension method: 1: public static class RulesExtensions 2: { 3: private static readonly MethodInfo getRulesMethod = typeof(IRuleset).GetMethod("GetRules"); 4: internal static readonly IDictionary<ObjectContext, Tuple<IRuleset, DbContext>> rulesets = new Dictionary<ObjectContext, Tuple<IRuleset, DbContext>>(); 5: 6: private static Type GetRealType(Object entity) 7: { 8: return (entity.GetType().Assembly.IsDynamic == true ? entity.GetType().BaseType : entity.GetType()); 9: } 10: 11: internal static void OnSaving(Object sender, EventArgs e) 12: { 13: ObjectContext octx = sender as ObjectContext; 14: IRuleset ruleset = rulesets[octx].Item1; 15: DbContext ctx = rulesets[octx].Item2; 16: 17: foreach (ObjectStateEntry entry in octx.ObjectStateManager.GetObjectStateEntries(EntityState.Added)) 18: { 19: Object entity = entry.Entity; 20: Type realType = GetRealType(entity); 21: 22: foreach (dynamic rule in (getRulesMethod.MakeGenericMethod(realType).Invoke(ruleset, null) as IEnumerable)) 23: { 24: if (rule.CanSave(entity, ctx) == false) 25: { 26: throw (new Exception(String.Format("Cannot save entity {0} due to rule {1}", entity, rule.Name))); 27: } 28: } 29: } 30: 31: foreach (ObjectStateEntry entry in octx.ObjectStateManager.GetObjectStateEntries(EntityState.Deleted)) 32: { 33: Object entity = entry.Entity; 34: Type realType = GetRealType(entity); 35: 36: foreach (dynamic rule in (getRulesMethod.MakeGenericMethod(realType).Invoke(ruleset, null) as IEnumerable)) 37: { 38: if (rule.CanDelete(entity, ctx) == false) 39: { 40: throw (new Exception(String.Format("Cannot delete entity {0} due to rule {1}", entity, rule.Name))); 41: } 42: } 43: } 44: 45: foreach (ObjectStateEntry entry in octx.ObjectStateManager.GetObjectStateEntries(EntityState.Modified)) 46: { 47: Object entity = entry.Entity; 48: Type realType = GetRealType(entity); 49: 50: foreach (dynamic rule in (getRulesMethod.MakeGenericMethod(realType).Invoke(ruleset, null) as IEnumerable)) 51: { 52: if (rule.CanUpdate(entity, ctx) == false) 53: { 54: throw (new Exception(String.Format("Cannot update entity {0} due to rule {1}", entity, rule.Name))); 55: } 56: } 57: } 58: } 59: 60: public static IRuleset CreateRuleset(this DbContext context) 61: { 62: Tuple<IRuleset, DbContext> ruleset = null; 63: ObjectContext octx = (context as IObjectContextAdapter).ObjectContext; 64: 65: if (rulesets.TryGetValue(octx, out ruleset) == false) 66: { 67: ruleset = rulesets[octx] = new Tuple<IRuleset, DbContext>(new Ruleset(octx), context); 68: 69: octx.SavingChanges += OnSaving; 70: } 71: 72: return (ruleset.Item1); 73: } 74: } It relies on the SavingChanges event of the ObjectContext to intercept the saving operations before they are actually issued. Yes, it uses a bit of dynamic magic! Very handy, by the way! So, let’s put it all together: 1: using (MyContext ctx = new MyContext()) 2: { 3: IRuleset rules = ctx.CreateRuleset(); 4: rules.AddRule(new ProductRule()); 5: 6: ctx.Products.Add(new Product() { Name = "xyz", Price = 50000 }); 7: 8: ctx.SaveChanges(); //an exception is fired here 9: 10: //when we no longer need to apply the rules 11: rules.Dispose(); 12: } Feel free to use it and extend it any way you like, and do give me your feedback! As a final note, this can be easily changed to support plain old Entity Framework (not Code First, that is), if that is what you are using.

Read the article

Gathering all data in single iteration vs using functions for readable code

- by user828584

Say I have an array of runners with which I need to find the tallest runner, the fastest runner, and the lightest runner. It seems like the most readable solution would be: runners = getRunners(); tallestRunner = getTallestRunner(runners); fastestRunner = getFastestRunner(runners); lightestRunner = getLightestRunner(runners); ..where each function iterates over the runners and keeps track of the largest height, greatest speed, and lowest weight. Iterating over the array three times, however, doesn't seem like a very good idea. It would instead be better to do: int greatestHeght, greatestSpeed, leastWeight; Runner tallestRunner, fastestRunner, lightestRunner; for(runner in runners){ if(runner.height > greatestHeight) { greatestHeight = runner.height; tallestRunner = runner; } if(runner.speed > ... } While this isn't too unreadable, it can get messy when there is more logic for each piece of information being extracted in the iteration. What's the middle ground here? How can I use only a single iteration while still keeping the code divided into logical units?

Read the article

Formatting php, what works more efficiently?

- by JamesM-SiteGen

Hello fellow programmers, I was just wondering what makes php work faster, I have a few methods that I always go and do, but that only improves the way I can read it, but how about the interpreter? Should I include the curly braces when there is only one statement to run? if(...){ echo "test"; } # Or.. if(...) echo "test"; === Which should be used? I have also found http://beta.phpformatter.com/ and I find the following settings to be good, but are they? Indentation: Indentation style: {K&R (One true brace style)} Indent with: {Tabs} Starting indentation: [1] Indentation: [1] Common: [x] Remove all comments [x] Remove empty lines [x] Align assignments statements nicely [ ] Put a comment with the condition after if, while, for, foreach, declare and catch statements Improvement: [x] Remove lines with just a semicolon (;) [x] Make normal comments (//) from perl comments (#) [x] Make long opening tag (<?php) from short one (<?) Brackets: [x] Space inside brackets- ( ) [x] Space inside empty brackets- ( ) [x] Space inside block brackets- [ ] [x] Space inside empty block brackets- [ ] Tiny var names: often I go through my code and change $var1 to $a, $var2 to $b and so on. I do include comments at the start of the file to show to me what each letter(s) mean.. Final note: So am I doing the right thing with the curly braces and the settings? Are there any great tips that help it run faster?

Read the article

Duplication of code (backend and javascript - knockout)

- by Michal B.

We have a new developer in our team. He seems a smart guy (he just came in so I cannot really judge). He started with implementing some small enhancements in the project (MVC3 web application using javascript - jquery and knockout). Let's say we have two values: A - quite complex calculation C - constant B = A + C On the screen there is value B and user can change it (normal texbox). When B changes, A changes as well because C is constant. So there is linear dependency between A and B. Now, all the calculations are done in the backend, but we need to recalculate A as user changes B (in js, I would use knockout). I thought about storing old A and B and when B changes by 10 then we know that new A will be old A + 10. He says this is dirty, because it's duplication of code (we make use of the fact that they are dependent and according to him that should be only in one place in our app). I understand it's not ideal, but making AJAX request after every key press seems a bit too much. It's a really small thing and I would not post if we haven't had long discussion about it. How do you deal with such problems? Also I can imagine that using knockout implies lots of calculations on the client side, which very often leads to duplication of the same calculations from the backend. Does anyone have links to some articles/thoughts on this topic?

Read the article

Augmenting functionality of subclasses without code duplication in C++

- by Rob W

I have to add common functionality to some classes that share the same superclass, preferably without bloating the superclass. The simplified inheritance chain looks like this: Element -> HTMLElement -> HTMLAnchorElement Element -> SVGElement -> SVGAlement The default doSomething() method on Element is no-op by default, but there are some subclasses that need an actual implementation that requires some extra overridden methods and instance members. I cannot put a full implementation of doSomething() in Element because 1) it is only relevant for some of the subclasses, 2) its implementation has a performance impact and 3) it depends on a method that could be overridden by a class in the inheritance chain between the superclass and a subclass, e.g. SVGElement in my example. Especially because of the third point, I wanted to solve the problem using a template class, as follows (it is a kind of decorator for classes): struct Element { virtual void doSomething() {} }; // T should be an instance of Element template<class T> struct AugmentedElement : public T { // doSomething is expensive and uses T virtual void doSomething() override {} // Used by doSomething virtual bool shouldDoSomething() = 0; }; class SVGElement : public Element { /* ... */ }; class SVGAElement : public AugmentedElement<SVGElement> { // some non-trivial check bool shouldDoSomething() { /* ... */ return true; } }; // Similarly for HTMLAElement and others I looked around (in the existing (huge) codebase and on the internet), but didn't find any similar code snippets, let alone an evaluation of the effectiveness and pitfalls of this approach. Is my design the right way to go, or is there a better way to add common functionality to some subclasses of a given superclass?

Read the article

Solver Foundation Optimization - 1D Bin Packing

- by Val Nolav

I want to optimize loading marbles into trucks. I do not know, if I can use Solver Foundation class for that purpose. Before, I start writing code, I wanted to ask it here. 1- Marbles can be in any weight between 1 to 24 Tons. 2 - A truck can hold maximum of 24 Tons. 3- It can be loaded as many marble cubes, as it can take for upto 24 tones, which means there is no Volume limitation. 4- There can be between 200 up to 500 different marbles depending on time. GOAL - The goal is to load marbles in minimum truck shipment. How can I do that without writing a lot of if conditions and for loops? Can I use Microsoft Solver Foundation for that purpose? I read the documentation provided by Microsoft however, I could not find a scenario similar to mine. M1+ M2 + M3 + .... Mn <=24 this is for one truck shipment. Let say there are 200 different Marbles and Marble weights are Float. Thanks

Read the article

Python optimization problem?

- by user342079

Alright, i had this homework recently (don't worry, i've already done it, but in c++) but I got curious how i could do it in python. The problem is about 2 light sources that emit light. I won't get into details tho. Here's the code (that I've managed to optimize a bit in the latter part): import math, array import numpy as np from PIL import Image size = (800,800) width, height = size s1x = width * 1./8 s1y = height * 1./8 s2x = width * 7./8 s2y = height * 7./8 r,g,b = (255,255,255) arr = np.zeros((width,height,3)) hy = math.hypot print 'computing distances (%s by %s)'%size, for i in xrange(width): if i%(width/10)==0: print i, if i%20==0: print '.', for j in xrange(height): d1 = hy(i-s1x,j-s1y) d2 = hy(i-s2x,j-s2y) arr[i][j] = abs(d1-d2) print '' arr2 = np.zeros((width,height,3),dtype="uint8") for ld in [200,116,100,84,68,52,36,20,8,4,2]: print 'now computing image for ld = '+str(ld) arr2 *= 0 arr2 += abs(arr%ld-ld/2)*(r,g,b)/(ld/2) print 'saving image...' ar2img = Image.fromarray(arr2) ar2img.save('ld'+str(ld).rjust(4,'0')+'.png') print 'saved as ld'+str(ld).rjust(4,'0')+'.png' I have managed to optimize most of it, but there's still a huge performance gap in the part with the 2 for-s, and I can't seem to think of a way to bypass that using common array operations... I'm open to suggestions :D

Read the article

LinQ optimization

- by Budda

Here is a peace of code: void MyFunc(List<MyObj> objects) { MyFunc1(objects); foreach( MyObj obj in objects.Where(obj1=>obj1.Good)) { // Do Action With Good Object } } void MyFunc1(List<MyObj> objects) { int iGoodCount = objects.Where(obj1=>obj1.Good).Count(); BeHappy(iGoodCount); // do other stuff with 'objects' collection } Here we see that collection is analyzed twice and each time the value of 'Good' property is checked for each member: 1st time when calculating count of good objects, 2nd - when iterating through all good objects. It is desirable to have that optimized, and here is a straightforward solution: before call to MyFunc1 makecreate an additional temporary collection of good objects only (goodObjects, it can be IEnumerable); get count of these objects and pass it as an additional parameter to MyFunc1; in the 'MyFunc' method iterate not through 'objects.Where(...)' but through the 'goodObjects' collection. Not too bad approach (as far as I see), but additional parameter is required to be passed. Question: is there any LinQ out-of-the-box functionality that allows any caching during 1st Where().Count(), remembering a processed collection and use it in the next iteration? Any thoughts are welcome. Thanks.

Read the article

Optimizing mathematics on arrays of floats in Ada 95 with GNAT

- by mat_geek

Consider the bellow code. This code is supposed to be processing data at a fixed rate, in one second batches, It is part of an overal system and can't take up too much time. When running over 100 lots of 1 seconds worth of data the program takes 35 seconds (or 35%), executing this function in a loop. The test loop is timed specifically with Ada.RealTime. The data is pregenerated so the majority of the execution time is definatetly in this loop. How do I improce the code to get the processing time down to a minimum? The code will be running on an Intel Pentium-M which is a P3 with SSE2. package FF is new Ada.Numerics.Generic_Elementary_Functions(Float); N : constant Integer := 820; type A is array(1 .. N) of Float; type A3 is array(1 .. 3) of A; procedure F(state : in out A3; result : out A3; l : in A; r : in A) is s : Float; t : Float; begin for i in 1 .. N loop t := l(i) + r(i); t := t / 2.0; state(1)(i) := t; state(2)(i) := t * 0.25 + state(2)(i) * 0.75; state(3)(i) := t * 1.0 /64.0 + state(2)(i) * 63.0 /64.0; for r in 1 .. 3 loop s := state(r)(i); t := FF."**"(s, 6.0) + 14.0; if t > MAX then t := MAX; elsif t < MIN then t := MIN; end if; result(r)(i) := FF.Log(t, 2.0); end loop; end loop; end; psuedocode for testing create two arrays of 80 random A3 arrays, called ls and rs; init the state and result A3 array record the realtime time now, called last for i in 1 .. 100 loop for j in 1 .. 80 loop F(state, result, ls(j), rs(j)); end loop; end loop; record the realtime time now, called curr output the duration between curr and last

Read the article

Should accessible members of an internal class be internal too?

- by Jeff Mercado

I'm designing a set of APIs for some applications I'm working on. I want to keep the code style consistent in all the classes I write but I've found that there are a few inconsistencies that I'm introducing and I don't know what the best way to resolve them is. My example here is specific to C# but this would apply to any language with similar mechanisms. There are a few classes that I need for implementation purposes that I don't necessarily want to expose in the API so I make them internal whereever needed. Generally what I would do is design the class as I normally would (e.g., make members public/protected/private where necessary) and change the visibility level of the class itself to internal. So I might have a few classes that look like this: internal interface IMyItem { ItemSet AddTo(ItemSet set); } internal class _SmallItem : IMyItem { private readonly /* parameters */; public _SmallItem(/* small item parameters */) { /* ... */ } public ItemSet AddTo(ItemSet set) { /* ... */ } } internal abstract class _CompositeItem: IMyItem { private readonly /* parameters */; public _CompositeItem(/* composite item parameters */) { /* ... */ } public abstract object UsefulInformation { get; } protected void HelperMethod(/* parameters */) { /* ... */ } } internal class _BigItem : _CompositeItem { private readonly /* parameters */; public _BigItem(/* big item parameters */) { /* ... */ } public override object UsefulInformation { get { /* ... */ } } public ItemSet AddTo(ItemSet set) { /* ... */ } } In another generated class (part of a parser/scanner), there is a structure that contains fields for all possible values it can represent. The class generated is internal too but I have control over the visibility of the members and decided to make them internal as well. internal partial struct ValueType { internal string String; internal ItemSet ItemSet; internal IMyItem MyItem; } internal class TokenValue { internal static int EQ(ItemSetScanner scanner) { /* ... */ } internal static int NAME(ItemSetScanner scanner, string value) { /* ... */ } internal static int VALUE(ItemSetScanner scanner, string value) { /* ... */ } //... } To me, this feels odd because the first set of classes, I didn't necessarily have to make some members public, they very well could have been made internal. internal members of an internal type can only be accessed internally anyway so why make them public? I just don't like the idea that the way I write my classes has to change drastically (i.e., change all uses of public to internal) just because the class is internal. Any thoughts on what I should do here? It makes sense to me that I might want to make some members of a class declared public, internal. But it's less clear to me when the class is declared internal.

Read the article

Best way to handle MySQL date for performance with thousands of users

- by bitLost

I am currently part of a team designing a site that will potentially have thousands of users who will be doing a number of date related searches. During the design phase we have been trying to determine which makes more sense for performance optimization. Should we store the datetime field as a mysql datetime. Or should be break it up into a number of fields (year, month, day, hour, minute, ...) The question is with a large data set and a potentially large set of users, would we gain performance wise breaking the datetime into multiple fields and saving on relying on mysql date functions? Or is mysql already optimized for this?

Read the article

web page db query optimisation

- by morpheous

I am putting together a web page which is quite 'expensive' in terms of Db hits. I dont want to start optimizing at this stage - though with me trying to hit a deadline, I may end up not optimising at all. Currently the page requires 18 (thats right eighteen) hits to the db. I am already using joins, and some of the queries are UNIONed to minimize the trips to the db. My local dev machine can handle this (page is not slow) however, I feel if I release this into the wild, the number of queries will quickly overwhelm my database (mySQL). I could always use memcache or something similar, but I would much rather continue with my other dev work that needs to be completed before the deadline - at least retrieving the page work, its simply a matter of optimization. My question therefore is - is 18 db queries for a single page retrieval completely outrageous - (i.e. I should put everything on hold and optimize the hell of the retrieval logic), or shall I continue as normal, meet the deadline and release on schedule and see what happens?

Read the article

Matlab: Optimization by perturbing variable

- by S_H

My main script contains following code: %# Grid and model parameters nModel=50; nModel_want=1; nI_grid1=5; Nth=1; nRow.Scale1=5; nCol.Scale1=5; nRow.Scale2=5^2; nCol.Scale2=5^2; theta = 90; % degrees a_minor = 2; % range along minor direction a_major = 5; % range along major direction sill = var(reshape(Deff_matrix_NthModel,nCell.Scale1,1)); % variance of the coarse data matrix of size nRow.Scale1 X nCol.Scale1 %# Covariance computation % Scale 1 for ihRow = 1:nRow.Scale1 for ihCol = 1:nCol.Scale1 [cov.Scale1(ihRow,ihCol),heff.Scale1(ihRow,ihCol)] = general_CovModel(theta, ihCol, ihRow, a_minor, a_major, sill, 'Exp'); end end % Scale 2 for ihRow = 1:nRow.Scale2 for ihCol = 1:nCol.Scale2 [cov.Scale2(ihRow,ihCol),heff.Scale2(ihRow,ihCol)] = general_CovModel(theta, ihCol/(nCol.Scale2/nCol.Scale1), ihRow/(nRow.Scale2/nRow.Scale1), a_minor, a_major, sill/(nRow.Scale2*nCol.Scale2), 'Exp'); end end %# Scale-up of fine scale values by averaging [covAvg.Scale2,var_covAvg.Scale2,varNorm_covAvg.Scale2] = general_AverageProperty(nRow.Scale2/nRow.Scale1,nCol.Scale2/nCol.Scale1,1,nRow.Scale1,nCol.Scale1,1,cov.Scale2,1); I am using two functions, general_CovModel() and general_AverageProperty(), in my main script which are given as following: function [cov,h_eff] = general_CovModel(theta, hx, hy, a_minor, a_major, sill, mod_type) % mod_type should be in strings angle_rad = theta*(pi/180); % theta in degrees, angle_rad in radians R_theta = [sin(angle_rad) cos(angle_rad); -cos(angle_rad) sin(angle_rad)]; h = [hx; hy]; lambda = a_minor/a_major; D_lambda = [lambda 0; 0 1]; h_2prime = D_lambda*R_theta*h; h_eff = sqrt((h_2prime(1)^2)+(h_2prime(2)^2)); if strcmp(mod_type,'Sph')==1 || strcmp(mod_type,'sph') ==1 if h_eff<=a cov = sill - sill.*(1.5*(h_eff/a_minor)-0.5*((h_eff/a_minor)^3)); else cov = sill; end elseif strcmp(mod_type,'Exp')==1 || strcmp(mod_type,'exp') ==1 cov = sill-(sill.*(1-exp(-(3*h_eff)/a_minor))); elseif strcmp(mod_type,'Gauss')==1 || strcmp(mod_type,'gauss') ==1 cov = sill-(sill.*(1-exp(-((3*h_eff)^2/(a_minor^2))))); end and function [PropertyAvg,variance_PropertyAvg,NormVariance_PropertyAvg]=... general_AverageProperty(blocksize_row,blocksize_col,blocksize_t,... nUpscaledRow,nUpscaledCol,nUpscaledT,PropertyArray,omega) % This function computes average of a property and variance of that averaged % property using power averaging PropertyAvg=zeros(nUpscaledRow,nUpscaledCol,nUpscaledT); %# Average of property for k=1:nUpscaledT, for j=1:nUpscaledCol, for i=1:nUpscaledRow, sum=0; for a=1:blocksize_row, for b=1:blocksize_col, for c=1:blocksize_t, sum=sum+(PropertyArray((i-1)*blocksize_row+a,(j-1)*blocksize_col+b,(k-1)*blocksize_t+c).^omega); % add all the property values in 'blocksize_x','blocksize_y','blocksize_t' to one variable end end end PropertyAvg(i,j,k)=(sum/(blocksize_row*blocksize_col*blocksize_t)).^(1/omega); % take average of the summed property end end end %# Variance of averageed property variance_PropertyAvg=var(reshape(PropertyAvg,... nUpscaledRow*nUpscaledCol*nUpscaledT,1),1,1); %# Normalized variance of averageed property NormVariance_PropertyAvg=variance_PropertyAvg./(var(reshape(... PropertyArray,numel(PropertyArray),1),1,1)); Question: Using Matlab, I would like to optimize covAvg.Scale2 such that it matches closely with cov.Scale1 by perturbing/varying any (or all) of the following variables 1) a_minor 2) a_major 3) theta Thanks.

Read the article

Python: Memory usage and optimization when modifying lists

- by xApple

The problem My concern is the following: I am storing a relativity large dataset in a classical python list and in order to process the data I must iterate over the list several times, perform some operations on the elements, and often pop an item out of the list. It seems that deleting one item out of a Python list costs O(N) since Python has to copy all the items above the element at hand down one place. Furthermore, since the number of items to delete is approximately proportional to the number of elements in the list this results in an O(N^2) algorithm. I am hoping to find a solution that is cost effective (time and memory-wise). I have studied what I could find on the internet and have summarized my different options below. Which one is the best candidate ? Keeping a local index: while processingdata: index = 0 while index < len(somelist): item = somelist[index] dosomestuff(item) if somecondition(item): del somelist[index] else: index += 1 This is the original solution I came up with. Not only is this not very elegant, but I am hoping there is better way to do it that remains time and memory efficient. Walking the list backwards: while processingdata: for i in xrange(len(somelist) - 1, -1, -1): dosomestuff(item) if somecondition(somelist, i): somelist.pop(i) This avoids incrementing an index variable but ultimately has the same cost as the original version. It also breaks the logic of dosomestuff(item) that wishes to process them in the same order as they appear in the original list. Making a new list: while processingdata: for i, item in enumerate(somelist): dosomestuff(item) newlist = [] for item in somelist: if somecondition(item): newlist.append(item) somelist = newlist gc.collect() This is a very naive strategy for eliminating elements from a list and requires lots of memory since an almost full copy of the list must be made. Using list comprehensions: while processingdata: for i, item in enumerate(somelist): dosomestuff(item) somelist[:] = [x for x in somelist if somecondition(x)] This is very elegant but under-the-cover it walks the whole list one more time and must copy most of the elements in it. My intuition is that this operation probably costs more than the original del statement at least memory wise. Keep in mind that somelist can be huge and that any solution that will iterate through it only once per run will probably always win. Using the filter function: while processingdata: for i, item in enumerate(somelist): dosomestuff(item) somelist = filter(lambda x: not subtle_condition(x), somelist) This also creates a new list occupying lots of RAM. Using the itertools' filter function: from itertools import ifilterfalse while processingdata: for item in itertools.ifilterfalse(somecondtion, somelist): dosomestuff(item) This version of the filter call does not create a new list but will not call dosomestuff on every item breaking the logic of the algorithm. I am including this example only for the purpose of creating an exhaustive list. Moving items up the list while walking while processingdata: index = 0 for item in somelist: dosomestuff(item) if not somecondition(item): somelist[index] = item index += 1 del somelist[index:] This is a subtle method that seems cost effective. I think it will move each item (or the pointer to each item ?) exactly once resulting in an O(N) algorithm. Finally, I hope Python will be intelligent enough to resize the list at the end without allocating memory for a new copy of the list. Not sure though. Abandoning Python lists: class Doubly_Linked_List: def __init__(self): self.first = None self.last = None self.n = 0 def __len__(self): return self.n def __iter__(self): return DLLIter(self) def iterator(self): return self.__iter__() def append(self, x): x = DLLElement(x) x.next = None if self.last is None: x.prev = None self.last = x self.first = x self.n = 1 else: x.prev = self.last x.prev.next = x self.last = x self.n += 1 class DLLElement: def __init__(self, x): self.next = None self.data = x self.prev = None class DLLIter: etc... This type of object resembles a python list in a limited way. However, deletion of an element is guaranteed O(1). I would not like to go here since this would require massive amounts of code refactoring almost everywhere.

Read the article

C# 'is' type check on struct - odd .NET 4.0 x86 optimization behavior

- by Jacob Stanley

Since upgrading to VS2010 I'm getting some very strange behavior with the 'is' keyword. The program below (test.cs) outputs True when compiled in debug mode (for x86) and False when compiled with optimizations on (for x86). Compiling all combinations in x64 or AnyCPU gives the expected result, True. All combinations of compiling under .NET 3.5 give the expected result, True. I'm using the batch file below (runtest.bat) to compile and test the code using various combinations of compiler .NET framework. Has anyone else seen these kind of problems under .NET 4.0? Does everyone else see the same behavior as me on their computer when running runtests.bat? #@$@#$?? Is there a fix for this? test.cs using System; public class Program { public static bool IsGuid(object item) { return item is Guid; } public static void Main() { Console.Write(IsGuid(Guid.NewGuid())); } } runtest.bat @echo off rem Usage: rem runtest -- runs with csc.exe x86 .NET 4.0 rem runtest 64 -- runs with csc.exe x64 .NET 4.0 rem runtest v3.5 -- runs with csc.exe x86 .NET 3.5 rem runtest v3.5 64 -- runs with csc.exe x64 .NET 3.5 set version=v4.0.30319 set platform=Framework for %%a in (%*) do ( if "%%a" == "64" (set platform=Framework64) if "%%a" == "v3.5" (set version=v3.5) ) echo Compiler: %platform%\%version%\csc.exe set csc="C:\Windows\Microsoft.NET\%platform%\%version%\csc.exe" set make=%csc% /nologo /nowarn:1607 test.cs rem CS1607: Referenced assembly targets a different processor rem This happens if you compile for x64 using csc32, or x86 using csc64 %make% /platform:x86 test.exe echo =^> x86 %make% /platform:x86 /optimize test.exe echo =^> x86 (Optimized) %make% /platform:x86 /debug test.exe echo =^> x86 (Debug) %make% /platform:x86 /debug /optimize test.exe echo =^> x86 (Debug + Optimized) %make% /platform:x64 test.exe echo =^> x64 %make% /platform:x64 /optimize test.exe echo =^> x64 (Optimized) %make% /platform:x64 /debug test.exe echo =^> x64 (Debug) %make% /platform:x64 /debug /optimize test.exe echo =^> x64 (Debug + Optimized) %make% /platform:AnyCPU test.exe echo =^> AnyCPU %make% /platform:AnyCPU /optimize test.exe echo =^> AnyCPU (Optimized) %make% /platform:AnyCPU /debug test.exe echo =^> AnyCPU (Debug) %make% /platform:AnyCPU /debug /optimize test.exe echo =^> AnyCPU (Debug + Optimized) Test Results When running the runtest.bat I get the following results on my Win7 x64 install. > runtest 32 v4.0 Compiler: Framework\v4.0.30319\csc.exe False => x86 False => x86 (Optimized) True => x86 (Debug) False => x86 (Debug + Optimized) True => x64 True => x64 (Optimized) True => x64 (Debug) True => x64 (Debug + Optimized) True => AnyCPU True => AnyCPU (Optimized) True => AnyCPU (Debug) True => AnyCPU (Debug + Optimized) > runtest 64 v4.0 Compiler: Framework64\v4.0.30319\csc.exe False => x86 False => x86 (Optimized) True => x86 (Debug) False => x86 (Debug + Optimized) True => x64 True => x64 (Optimized) True => x64 (Debug) True => x64 (Debug + Optimized) True => AnyCPU True => AnyCPU (Optimized) True => AnyCPU (Debug) True => AnyCPU (Debug + Optimized) > runtest 32 v3.5 Compiler: Framework\v3.5\csc.exe True => x86 True => x86 (Optimized) True => x86 (Debug) True => x86 (Debug + Optimized) True => x64 True => x64 (Optimized) True => x64 (Debug) True => x64 (Debug + Optimized) True => AnyCPU True => AnyCPU (Optimized) True => AnyCPU (Debug) True => AnyCPU (Debug + Optimized) > runtest 64 v3.5 Compiler: Framework64\v3.5\csc.exe True => x86 True => x86 (Optimized) True => x86 (Debug) True => x86 (Debug + Optimized) True => x64 True => x64 (Optimized) True => x64 (Debug) True => x64 (Debug + Optimized) True => AnyCPU True => AnyCPU (Optimized) True => AnyCPU (Debug) True => AnyCPU (Debug + Optimized) tl;dr

Read the article

Odd optimization problem under MSVC

- by Goz

I've seen this blog: http://igoro.com/archive/gallery-of-processor-cache-effects/ The "weirdness" in part 7 is what caught my interest. My first thought was "Thats just C# being weird". Its not I wrote the following C++ code. volatile int* p = (volatile int*)_aligned_malloc( sizeof( int ) * 8, 64 ); memset( (void*)p, 0, sizeof( int ) * 8 ); double dStart = t.GetTime(); for (int i = 0; i < 200000000; i++) { //p[0]++;p[1]++;p[2]++;p[3]++; // Option 1 //p[0]++;p[2]++;p[4]++;p[6]++; // Option 2 p[0]++;p[2]++; // Option 3 } double dTime = t.GetTime() - dStart; The timing I get on my 2.4 Ghz Core 2 Quad go as follows: Option 1 = ~8 cycles per loop. Option 2 = ~4 cycles per loop. Option 3 = ~6 cycles per loop. Now This is confusing. My reasoning behind the difference comes down to the cache write latency (3 cycles) on my chip and an assumption that the cache has a 128-bit write port (This is pure guess work on my part). On that basis in Option 1: It will increment p[0] (1 cycle) then increment p[2] (1 cycle) then it has to wait 1 cycle (for cache) then p[1] (1 cycle) then wait 1 cycle (for cache) then p[3] (1 cycle). Finally 2 cycles for increment and jump (Though its usually implemented as decrement and jump). This gives a total of 8 cycles. In Option 2: It can increment p[0] and p[4] in one cycle then increment p[2] and p[6] in another cycle. Then 2 cycles for subtract and jump. No waits needed on cache. Total 4 cycles. In option 3: It can increment p[0] then has to wait 2 cycles then increment p[2] then subtract and jump. The problem is if you set case 3 to increment p[0] and p[4] it STILL takes 6 cycles (which kinda blows my 128-bit read/write port out of the water). So ... can anyone tell me what the hell is going on here? Why DOES case 3 take longer? Also I'd love to know what I've got wrong in my thinking above, as i obviously have something wrong! Any ideas would be much appreciated! :) It'd also be interesting to see how GCC or any other compiler copes with it as well! Edit: Jerry Coffin's idea gave me some thoughts. I've done some more tests (on a different machine so forgive the change in timings) with and without nops and with different counts of nops case 2 - 0.46 00401ABD jne (401AB0h) 0 nops - 0.68 00401AB7 jne (401AB0h) 1 nop - 0.61 00401AB8 jne (401AB0h) 2 nops - 0.636 00401AB9 jne (401AB0h) 3 nops - 0.632 00401ABA jne (401AB0h) 4 nops - 0.66 00401ABB jne (401AB0h) 5 nops - 0.52 00401ABC jne (401AB0h) 6 nops - 0.46 00401ABD jne (401AB0h) 7 nops - 0.46 00401ABE jne (401AB0h) 8 nops - 0.46 00401ABF jne (401AB0h) 9 nops - 0.55 00401AC0 jne (401AB0h) I've included the jump statetements so you can see that the source and destination are in one cache line. You can also see that we start to get a difference when we are 13 bytes or more apart. Until we hit 16 ... then it all goes wrong. So Jerry isn't right (though his suggestion DOES help a bit), however something IS going on. I'm more and more intrigued to try and figure out what it is now. It does appear to be more some sort of memory alignment oddity rather than some sort of instruction throughput oddity. Anyone want to explain this for an inquisitive mind? :D Edit 3: Interjay has a point on the unrolling that blows the previous edit out of the water. With an unrolled loop the performance does not improve. You need to add a nop in to make the gap between jump source and destination the same as for my good nop count above. Performance still sucks. Its interesting that I need 6 nops to improve performance though. I wonder how many nops the processor can issue per cycle? If its 3 then that account for the cache write latency ... But, if thats it, why is the latency occurring? Curiouser and curiouser ...

Read the article

Image/"most resembling pixel" search optimization?

- by SigTerm

The situation: Let's say I have an image A, say, 512x512 pixels, and image B, 5x5 or 7x7 pixels. Both images are 24bit rgb, and B have 1bit alpha mask (so each pixel is either completely transparent or completely solid). I need to find within image A a pixel which (with its' neighbors) most closely resembles image B, OR the pixel that probably most closely resembles image B. Resemblance is calculated as "distance" which is sum of "distances" between non-transparent B's pixels and A's pixels divided by number of non-transparent B's pixels. Here is a sample SDL code for explanation: struct Pixel{ unsigned char b, g, r, a; }; void fillPixel(int x, int y, SDL_Surface* dst, SDL_Surface* src, int dstMaskX, int dstMaskY){ Pixel& dstPix = *((Pixel*)((char*)(dst->pixels) + sizeof(Pixel)*x + dst->pitch*y)); int xMin = x + texWidth - searchWidth; int xMax = xMin + searchWidth*2; int yMin = y + texHeight - searchHeight; int yMax = yMin + searchHeight*2; int numFilled = 0; for (int curY = yMin; curY < yMax; curY++) for (int curX = xMin; curX < xMax; curX++){ Pixel& cur = *((Pixel*)((char*)(dst->pixels) + sizeof(Pixel)*(curX & texMaskX) + dst->pitch*(curY & texMaskY))); if (cur.a != 0) numFilled++; } if (numFilled == 0){ int srcX = rand() % src->w; int srcY = rand() % src->h; dstPix = *((Pixel*)((char*)(src->pixels) + sizeof(Pixel)*srcX + src->pitch*srcY)); dstPix.a = 0xFF; return; } int storedSrcX = rand() % src->w; int storedSrcY = rand() % src->h; float lastDifference = 3.40282347e+37F; //unsigned char mask = for (int srcY = searchHeight; srcY < (src->h - searchHeight); srcY++) for (int srcX = searchWidth; srcX < (src->w - searchWidth); srcX++){ float curDifference = 0; int numPixels = 0; for (int tmpY = -searchHeight; tmpY < searchHeight; tmpY++) for(int tmpX = -searchWidth; tmpX < searchWidth; tmpX++){ Pixel& tmpSrc = *((Pixel*)((char*)(src->pixels) + sizeof(Pixel)*(srcX+tmpX) + src->pitch*(srcY+tmpY))); Pixel& tmpDst = *((Pixel*)((char*)(dst->pixels) + sizeof(Pixel)*((x + dst->w + tmpX) & dstMaskX) + dst->pitch*((y + dst->h + tmpY) & dstMaskY))); if (tmpDst.a){ numPixels++; int dr = tmpSrc.r - tmpDst.r; int dg = tmpSrc.g - tmpDst.g; int db = tmpSrc.g - tmpDst.g; curDifference += dr*dr + dg*dg + db*db; } } if (numPixels) curDifference /= (float)numPixels; if (curDifference < lastDifference){ lastDifference = curDifference; storedSrcX = srcX; storedSrcY = srcY; } } dstPix = *((Pixel*)((char*)(src->pixels) + sizeof(Pixel)*storedSrcX + src->pitch*storedSrcY)); dstPix.a = 0xFF; } This thing is supposed to be used for texture generation. Now, the question: The easiest way to do this is brute force search (which is used in example routine). But it is slow - even using GPU acceleration and dual core cpu won't make it much faster. It looks like I can't use modified binary search because of B's mask. So, how can I find desired pixel faster? Additional Info: It is allowed to use 2 cores, GPU acceleration, CUDA, and 1.5..2 gigabytes of RAM for the task. I would prefer to avoid some kind of lengthy preprocessing phase that will take 30 minutes to finish. Ideas?

Read the article

Does my basic PHP Socket Server need optimization?

- by Tom

Like many people, I can do a lot of things with PHP. One problem I do face constantly is that other people can do it much cleaner, much more organized and much more structured. This also results in much faster execution times and much less bugs. I just finished writing a basic PHP Socket Server (the real core), and am asking you if you can tell me what I should do different before I start expanding the core. I'm not asking about improvements such as encrypted data, authentication or multi-threading. I'm more wondering about questions like "should I maybe do it in a more object oriented way (using PHP5)?", or "is the general structure of the way the script works good, or should some things be done different?". Basically, "is this how the core of a socket server should work?" In fact, I think that if I just show you the code here many of you will immediately see room for improvements. Please be so kind to tell me. Thanks! #!/usr/bin/php -q <? // config $timelimit = 180; // amount of seconds the server should run for, 0 = run indefintely $address = $_SERVER['SERVER_ADDR']; // the server's external IP $port = 9000; // the port to listen on $backlog = SOMAXCONN; // the maximum of backlog incoming connections that will be queued for processing // configure custom PHP settings error_reporting(1); // report all errors ini_set('display_errors', 1); // display all errors set_time_limit($timelimit); // timeout after x seconds ob_implicit_flush(); // results in a flush operation after every output call //create master IPv4 based TCP socket if (!($master = socket_create(AF_INET, SOCK_STREAM, SOL_TCP))) die("Could not create master socket, error: ".socket_strerror(socket_last_error())); // set socket options (local addresses can be reused) if (!socket_set_option($master, SOL_SOCKET, SO_REUSEADDR, 1)) die("Could not set socket options, error: ".socket_strerror(socket_last_error())); // bind to socket server if (!socket_bind($master, $address, $port)) die("Could not bind to socket server, error: ".socket_strerror(socket_last_error())); // start listening if (!socket_listen($master, $backlog)) die("Could not start listening to socket, error: ".socket_strerror(socket_last_error())); //display startup information echo "[".date('Y-m-d H:i:s')."] SERVER CREATED (MAXCONN: ".SOMAXCONN.").\n"; //max connections is a kernel variable and can be adjusted with sysctl echo "[".date('Y-m-d H:i:s')."] Listening on ".$address.":".$port.".\n"; $time = time(); //set startup timestamp // init read sockets array $read_sockets = array($master); // continuously handle incoming socket messages, or close if time limit has been reached while ((!$timelimit) or (time() - $time < $timelimit)) { $changed_sockets = $read_sockets; socket_select($changed_sockets, $write = null, $except = null, null); foreach($changed_sockets as $socket) { if ($socket == $master) { if (($client = socket_accept($master)) < 0) { echo "[".date('Y-m-d H:i:s')."] Socket_accept() failed, error: ".socket_strerror(socket_last_error())."\n"; continue; } else { array_push($read_sockets, $client); echo "[".date('Y-m-d H:i:s')."] Client #".count($read_sockets)." connected (connections: ".count($read_sockets)."/".SOMAXCONN.")\n"; } } else { $data = @socket_read($socket, 1024, PHP_NORMAL_READ); //read a maximum of 1024 bytes until a new line has been sent if ($data === false) { //the client disconnected $index = array_search($socket, $read_sockets); unset($read_sockets[$index]); socket_close($socket); echo "[".date('Y-m-d H:i:s')."] Client #".($index-1)." disconnected (connections: ".count($read_sockets)."/".SOMAXCONN.")\n"; } else { if ($data = trim($data)) { //remove whitespace and continue only if the message is not empty switch ($data) { case "exit": //close connection when exit command is given $index = array_search($socket, $read_sockets); unset($read_sockets[$index]); socket_close($socket); echo "[".date('Y-m-d H:i:s')."] Client #".($index-1)." disconnected (connections: ".count($read_sockets)."/".SOMAXCONN.")\n"; break; default: //for experimental purposes, write the given data back socket_write($socket, "\n you wrote: ".$data); } } } } } } socket_close($master); //close the socket echo "[".date('Y-m-d H:i:s')."] SERVER CLOSED.\n"; ?>

Read the article

What is this code?

- by Aerovistae

This is from the Evolution of a Programmer "joke", at the "Master Programmer" level. It seems to be C++, but I don't know what all this bloated extra stuff is, nor did any Google searches turn up anything except the joke I took it from. Can anyone tell me more about what I'm reading here? [ uuid(2573F8F4-CFEE-101A-9A9F-00AA00342820) ] library LHello { // bring in the master library importlib("actimp.tlb"); importlib("actexp.tlb"); // bring in my interfaces #include "pshlo.idl" [ uuid(2573F8F5-CFEE-101A-9A9F-00AA00342820) ] cotype THello { interface IHello; interface IPersistFile; }; }; [ exe, uuid(2573F890-CFEE-101A-9A9F-00AA00342820) ] module CHelloLib { // some code related header files importheader(<windows.h>); importheader(<ole2.h>); importheader(<except.hxx>); importheader("pshlo.h"); importheader("shlo.hxx"); importheader("mycls.hxx"); // needed typelibs importlib("actimp.tlb"); importlib("actexp.tlb"); importlib("thlo.tlb"); [ uuid(2573F891-CFEE-101A-9A9F-00AA00342820), aggregatable ] coclass CHello { cotype THello; }; }; #include "ipfix.hxx" extern HANDLE hEvent; class CHello : public CHelloBase { public: IPFIX(CLSID_CHello); CHello(IUnknown *pUnk); ~CHello(); HRESULT __stdcall PrintSz(LPWSTR pwszString); private: static int cObjRef; }; #include <windows.h> #include <ole2.h> #include <stdio.h> #include <stdlib.h> #include "thlo.h" #include "pshlo.h" #include "shlo.hxx" #include "mycls.hxx" int CHello:cObjRef = 0; CHello::CHello(IUnknown *pUnk) : CHelloBase(pUnk) { cObjRef++; return; } HRESULT __stdcall CHello::PrintSz(LPWSTR pwszString) { printf("%ws\n", pwszString); return(ResultFromScode(S_OK)); } CHello::~CHello(void) { // when the object count goes to zero, stop the server cObjRef--; if( cObjRef == 0 ) PulseEvent(hEvent); return; } #include <windows.h> #include <ole2.h> #include "pshlo.h" #include "shlo.hxx" #include "mycls.hxx" HANDLE hEvent; int _cdecl main( int argc, char * argv[] ) { ULONG ulRef; DWORD dwRegistration; CHelloCF *pCF = new CHelloCF(); hEvent = CreateEvent(NULL, FALSE, FALSE, NULL); // Initialize the OLE libraries CoInitiali, NULL); // Initialize the OLE libraries CoInitializeEx(NULL, COINIT_MULTITHREADED); CoRegisterClassObject(CLSID_CHello, pCF, CLSCTX_LOCAL_SERVER, REGCLS_MULTIPLEUSE, &dwRegistration); // wait on an event to stop WaitForSingleObject(hEvent, INFINITE); // revoke and release the class object CoRevokeClassObject(dwRegistration); ulRef = pCF->Release(); // Tell OLE we are going away. CoUninitialize(); return(0); } extern CLSID CLSID_CHello; extern UUID LIBID_CHelloLib; CLSID CLSID_CHello = { /* 2573F891-CFEE-101A-9A9F-00AA00342820 */ 0x2573F891, 0xCFEE, 0x101A, { 0x9A, 0x9F, 0x00, 0xAA, 0x00, 0x34, 0x28, 0x20 } }; UUID LIBID_CHelloLib = { /* 2573F890-CFEE-101A-9A9F-00AA00342820 */ 0x2573F890, 0xCFEE, 0x101A, { 0x9A, 0x9F, 0x00, 0xAA, 0x00, 0x34, 0x28, 0x20 } }; #include <windows.h> #include <ole2.h> #include <stdlib.h> #include <string.h> #include <stdio.h> #include "pshlo.h" #include "shlo.hxx" #include "clsid.h" int _cdecl main( int argc, char * argv[] ) { HRESULT hRslt; IHello *pHello; ULONG ulCnt; IMoniker * pmk; WCHAR wcsT[_MAX_PATH]; WCHAR wcsPath[2 * _MAX_PATH]; // get object path wcsPath[0] = '\0'; wcsT[0] = '\0'; if( argc > 1) { mbstowcs(wcsPath, argv[1], strlen(argv[1]) + 1); wcsupr(wcsPath); } else { fprintf(stderr, "Object path must be specified\n"); return(1); } // get print string if(argc > 2) mbstowcs(wcsT, argv[2], strlen(argv[2]) + 1); else wcscpy(wcsT, L"Hello World"); printf("Linking to object %ws\n", wcsPath); printf("Text String %ws\n", wcsT); // Initialize the OLE libraries hRslt = CoInitializeEx(NULL, COINIT_MULTITHREADED); if(SUCCEEDED(hRslt)) { hRslt = CreateFileMoniker(wcsPath, &pmk); if(SUCCEEDED(hRslt)) hRslt = BindMoniker(pmk, 0, IID_IHello, (void **)&pHello); if(SUCCEEDED(hRslt)) { // print a string out pHello->PrintSz(wcsT); Sleep(2000); ulCnt = pHello->Release(); } else printf("Failure to connect, status: %lx", hRslt); // Tell OLE we are going away. CoUninitialize(); } return(0); }

Read the article

Java code optimization on matrix windowing computes in more time

- by rano

I have a matrix which represents an image and I need to cycle over each pixel and for each one of those I have to compute the sum of all its neighbors, ie the pixels that belong to a window of radius rad centered on the pixel. I came up with three alternatives: The simplest way, the one that recomputes the window for each pixel The more optimized way that uses a queue to store the sums of the window columns and cycling through the columns of the matrix updates this queue by adding a new element and removing the oldes The even more optimized way that does not need to recompute the queue for each row but incrementally adjusts a previously saved one I implemented them in c++ using a queue for the second method and a combination of deques for the third (I need to iterate through their elements without destructing them) and scored their times to see if there was an actual improvement. it appears that the third method is indeed faster. Then I tried to port the code to Java (and I must admit that I'm not very comfortable with it). I used ArrayDeque for the second method and LinkedLists for the third resulting in the third being inefficient in time. Here is the simplest method in C++ (I'm not posting the java version since it is almost identical): void normalWindowing(int mat[][MAX], int cols, int rows, int rad){ int i, j; int h = 0; for (i = 0; i < rows; ++i) { for (j = 0; j < cols; j++) { h = 0; for (int ry =- rad; ry <= rad; ry++) { int y = i + ry; if (y >= 0 && y < rows) { for (int rx =- rad; rx <= rad; rx++) { int x = j + rx; if (x >= 0 && x < cols) { h += mat[y][x]; } } } } } } } Here is the second method (the one optimized through columns) in C++: void opt1Windowing(int mat[][MAX], int cols, int rows, int rad){ int i, j, h, y, col; queue<int>* q = NULL; for (i = 0; i < rows; ++i) { if (q != NULL) delete(q); q = new queue<int>(); h = 0; for (int rx = 0; rx <= rad; rx++) { if (rx < cols) { int mem = 0; for (int ry =- rad; ry <= rad; ry++) { y = i + ry; if (y >= 0 && y < rows) { mem += mat[y][rx]; } } q->push(mem); h += mem; } } for (j = 1; j < cols; j++) { col = j + rad; if (j - rad > 0) { h -= q->front(); q->pop(); } if (j + rad < cols) { int mem = 0; for (int ry =- rad; ry <= rad; ry++) { y = i + ry; if (y >= 0 && y < rows) { mem += mat[y][col]; } } q->push(mem); h += mem; } } } } And here is the Java version: public static void opt1Windowing(int [][] mat, int rad){ int i, j = 0, h, y, col; int cols = mat[0].length; int rows = mat.length; ArrayDeque<Integer> q = null; for (i = 0; i < rows; ++i) { q = new ArrayDeque<Integer>(); h = 0; for (int rx = 0; rx <= rad; rx++) { if (rx < cols) { int mem = 0; for (int ry =- rad; ry <= rad; ry++) { y = i + ry; if (y >= 0 && y < rows) { mem += mat[y][rx]; } } q.addLast(mem); h += mem; } } j = 0; for (j = 1; j < cols; j++) { col = j + rad; if (j - rad > 0) { h -= q.peekFirst(); q.pop(); } if (j + rad < cols) { int mem = 0; for (int ry =- rad; ry <= rad; ry++) { y = i + ry; if (y >= 0 && y < rows) { mem += mat[y][col]; } } q.addLast(mem); h += mem; } } } } I recognize this post will be a wall of text. Here is the third method in C++: void opt2Windowing(int mat[][MAX], int cols, int rows, int rad){ int i = 0; int j = 0; int h = 0; int hh = 0; deque< deque<int> *> * M = new deque< deque<int> *>(); for (int ry = 0; ry <= rad; ry++) { if (ry < rows) { deque<int> * q = new deque<int>(); M->push_back(q); for (int rx = 0; rx <= rad; rx++) { if (rx < cols) { int val = mat[ry][rx]; q->push_back(val); h += val; } } } } deque<int> * C = new deque<int>(M->front()->size()); deque<int> * Q = new deque<int>(M->front()->size()); deque<int> * R = new deque<int>(M->size()); deque< deque<int> *>::iterator mit; deque< deque<int> *>::iterator mstart = M->begin(); deque< deque<int> *>::iterator mend = M->end(); deque<int>::iterator rit; deque<int>::iterator rstart = R->begin(); deque<int>::iterator rend = R->end(); deque<int>::iterator cit; deque<int>::iterator cstart = C->begin(); deque<int>::iterator cend = C->end(); for (mit = mstart, rit = rstart; mit != mend, rit != rend; ++mit, ++rit) { deque<int>::iterator pit; deque<int>::iterator pstart = (* mit)->begin(); deque<int>::iterator pend = (* mit)->end(); for(cit = cstart, pit = pstart; cit != cend && pit != pend; ++cit, ++pit) { (* cit) += (* pit); (* rit) += (* pit); } } for (i = 0; i < rows; ++i) { j = 0; if (i - rad > 0) { deque<int>::iterator cit; deque<int>::iterator cstart = C->begin(); deque<int>::iterator cend = C->end(); deque<int>::iterator pit; deque<int>::iterator pstart = (M->front())->begin(); deque<int>::iterator pend = (M->front())->end(); for(cit = cstart, pit = pstart; cit != cend; ++cit, ++pit) { (* cit) -= (* pit); } deque<int> * k = M->front(); M->pop_front(); delete k; h -= R->front(); R->pop_front(); } int row = i + rad; if (row < rows && i > 0) { deque<int> * newQ = new deque<int>(); M->push_back(newQ); deque<int>::iterator cit; deque<int>::iterator cstart = C->begin(); deque<int>::iterator cend = C->end(); int rx; int tot = 0; for (rx = 0, cit = cstart; rx <= rad; rx++, ++cit) { if (rx < cols) { int val = mat[row][rx]; newQ->push_back(val); (* cit) += val; tot += val; } } R->push_back(tot); h += tot; } hh = h; copy(C->begin(), C->end(), Q->begin()); for (j = 1; j < cols; j++) { int col = j + rad; if (j - rad > 0) { hh -= Q->front(); Q->pop_front(); } if (j + rad < cols) { int val = 0; for (int ry =- rad; ry <= rad; ry++) { int y = i + ry; if (y >= 0 && y < rows) { val += mat[y][col]; } } hh += val; Q->push_back(val); } } } } And finally its Java version: public static void opt2Windowing(int [][] mat, int rad){ int cols = mat[0].length; int rows = mat.length; int i = 0; int j = 0; int h = 0; int hh = 0; LinkedList<LinkedList<Integer>> M = new LinkedList<LinkedList<Integer>>(); for (int ry = 0; ry <= rad; ry++) { if (ry < rows) { LinkedList<Integer> q = new LinkedList<Integer>(); M.addLast(q); for (int rx = 0; rx <= rad; rx++) { if (rx < cols) { int val = mat[ry][rx]; q.addLast(val); h += val; } } } } int firstSize = M.getFirst().size(); int mSize = M.size(); LinkedList<Integer> C = new LinkedList<Integer>(); LinkedList<Integer> Q = null; LinkedList<Integer> R = new LinkedList<Integer>(); for (int k = 0; k < firstSize; k++) { C.add(0); } for (int k = 0; k < mSize; k++) { R.add(0); } ListIterator<LinkedList<Integer>> mit; ListIterator<Integer> rit; ListIterator<Integer> cit; ListIterator<Integer> pit; for (mit = M.listIterator(), rit = R.listIterator(); mit.hasNext();) { Integer r = rit.next(); int rsum = 0; for (cit = C.listIterator(), pit = (mit.next()).listIterator(); cit.hasNext();) { Integer c = cit.next(); Integer p = pit.next(); rsum += p; cit.set(c + p); } rit.set(r + rsum); } for (i = 0; i < rows; ++i) { j = 0; if (i - rad > 0) { for(cit = C.listIterator(), pit = M.getFirst().listIterator(); cit.hasNext();) { Integer c = cit.next(); Integer p = pit.next(); cit.set(c - p); } M.removeFirst(); h -= R.getFirst(); R.removeFirst(); } int row = i + rad; if (row < rows && i > 0) { LinkedList<Integer> newQ = new LinkedList<Integer>(); M.addLast(newQ); int rx; int tot = 0; for (rx = 0, cit = C.listIterator(); rx <= rad; rx++) { if (rx < cols) { Integer c = cit.next(); int val = mat[row][rx]; newQ.addLast(val); cit.set(c + val); tot += val; } } R.addLast(tot); h += tot; } hh = h; Q = new LinkedList<Integer>(); Q.addAll(C); for (j = 1; j < cols; j++) { int col = j + rad; if (j - rad > 0) { hh -= Q.getFirst(); Q.pop(); } if (j + rad < cols) { int val = 0; for (int ry =- rad; ry <= rad; ry++) { int y = i + ry; if (y >= 0 && y < rows) { val += mat[y][col]; } } hh += val; Q.addLast(val); } } } } I guess that most is due to the poor choice of the LinkedList in Java and to the lack of an efficient (not shallow) copy method between two LinkedList. How can I improve the third Java method? Am I doing some conceptual error? As always, any criticisms is welcome. UPDATE Even if it does not solve the issue, using ArrayLists, as being suggested, instead of LinkedList improves the third method. The second one performs still better (but when the number of rows and columns of the matrix is lower than 300 and the window radius is small the first unoptimized method is the fastest in Java)

Read the article

Shadow volume shader optimization (GLSL)

- by Soubok

I wondering if there is a way to optimize this vertex shader. This vertex shader projects (in the light direction) a vertex to the far plane if it is in the shadow. void main(void) { vec3 lightDir = (gl_ModelViewMatrix * gl_Vertex - gl_LightSource[0].position).xyz; // if the vertex is lit if ( dot(lightDir, gl_NormalMatrix * gl_Normal) < 0.01 ) { // don't move it gl_Position = ftransform(); } else { // move it far, is the light direction vec4 fin = gl_ProjectionMatrix * ( gl_ModelViewMatrix * gl_Vertex + vec4(normalize(lightDir) * 100000.0, 0.0) ); if ( fin.z > fin.w ) // if fin is behind the far plane fin.z = fin.w; // move to the far plane (needed for z-fail algo.) gl_Position = fin; } }

Read the article

MySql query optimization help

- by rohitgu

I have few queries and am not able to figure out how to optimize them, QUERY 1 select * from t_twitter_tracking where classified is null and tweetType='ENGLISH' order by id limit 500; QUERY 2 Select count(*) as cnt, DATE_FORMAT(CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30'),'%Y-%m-%d') as dat from t_twitter_tracking wrdTrk where wrdTrk.word like ('dell') and CONVERT_TZ(wrdTrk.createdOnGMTDate,'+00:00','+05:30') between '2010-12-12 00:00:00' and '2010-12-26 00:00:00' group by dat; Both these queries run on the same table, CREATE TABLE `t_twitter_tracking` ( `id` BIGINT(20) NOT NULL AUTO_INCREMENT, `word` VARCHAR(200) NOT NULL, `tweetId` BIGINT(100) NOT NULL, `twtText` VARCHAR(800) NULL DEFAULT NULL, `language` TEXT NULL, `links` TEXT NULL, `tweetType` VARCHAR(20) NULL DEFAULT NULL, `source` TEXT NULL, `sourceStripped` TEXT NULL, `isTruncated` VARCHAR(40) NULL DEFAULT NULL, `inReplyToStatusId` BIGINT(30) NULL DEFAULT NULL, `inReplyToUserId` INT(11) NULL DEFAULT NULL, `rtUsrProfilePicUrl` TEXT NULL, `isFavorited` VARCHAR(40) NULL DEFAULT NULL, `inReplyToScreenName` VARCHAR(40) NULL DEFAULT NULL, `latitude` BIGINT(100) NOT NULL, `longitude` BIGINT(100) NOT NULL, `retweetedStatus` VARCHAR(40) NULL DEFAULT NULL, `statusInReplyToStatusId` BIGINT(100) NOT NULL, `statusInReplyToUserId` BIGINT(100) NOT NULL, `statusFavorited` VARCHAR(40) NULL DEFAULT NULL, `statusInReplyToScreenName` TEXT NULL, `screenName` TEXT NULL, `profilePicUrl` TEXT NULL, `twitterId` BIGINT(100) NOT NULL, `name` TEXT NULL, `location` VARCHAR(100) NULL DEFAULT NULL, `bio` TEXT NULL, `url` TEXT NULL COLLATE 'latin1_swedish_ci', `utcOffset` INT(11) NULL DEFAULT NULL, `timeZone` VARCHAR(100) NULL DEFAULT NULL, `frenCnt` BIGINT(20) NULL DEFAULT '0', `createdAt` DATETIME NULL DEFAULT NULL, `createdOnGMT` VARCHAR(40) NULL DEFAULT NULL, `createdOnServerTime` DATETIME NULL DEFAULT NULL, `follCnt` BIGINT(20) NULL DEFAULT '0', `favCnt` BIGINT(20) NULL DEFAULT '0', `totStatusCnt` BIGINT(20) NULL DEFAULT NULL, `usrCrtDate` VARCHAR(200) NULL DEFAULT NULL, `humanSentiment` VARCHAR(30) NULL DEFAULT NULL, `replied` BIT(1) NULL DEFAULT NULL, `replyMsg` TEXT NULL, `classified` INT(32) NULL DEFAULT NULL, `createdOnGMTDate` DATETIME NULL DEFAULT NULL, `locationDetail` TEXT NULL, `geonameid` INT(11) NULL DEFAULT NULL, `country` VARCHAR(255) NULL DEFAULT NULL, `continent` CHAR(2) NULL DEFAULT NULL, `placeLongitude` FLOAT NULL DEFAULT NULL, `placeLatitude` FLOAT NULL DEFAULT NULL, PRIMARY KEY (`id`), INDEX `id` (`id`, `word`), INDEX `createdOnGMT_index` (`createdOnGMT`) USING BTREE, INDEX `word_index` (`word`) USING BTREE, INDEX `location_index` (`location`) USING BTREE, INDEX `classified_index` (`classified`) USING BTREE, INDEX `tweetType_index` (`tweetType`) USING BTREE, INDEX `getunclassified_index` (`classified`, `tweetType`) USING BTREE, INDEX `timeline_index` (`word`, `createdOnGMTDate`, `classified`) USING BTREE, INDEX `createdOnGMTDate_index` (`createdOnGMTDate`) USING BTREE, INDEX `locdetail_index` (`country`, `id`) USING BTREE, FULLTEXT INDEX `twtText_index` (`twtText`) ) COLLATE='utf8_general_ci' ENGINE=MyISAM ROW_FORMAT=DEFAULT AUTO_INCREMENT=12608048; The table has more than 10 million records. How can I optimize it?

Read the article

Mysql Sub Select Query Optimization

- by Matt

I'm running a query daily to compile stats - but it seems really inefficient. This is the Query: SELECT a.id, tstamp, label_id, (SELECT author_id FROM b WHERE b.tid = a.id ORDER BY b.tstamp DESC LIMIT 1) AS author_id FROM a, b WHERE (status = '2' OR status = '3') AND category != 6 AND a.id = b.tid AND (b.type = 'C' OR b.type = 'R') AND a.tstamp1 BETWEEN {$timestamp_start} AND {$timestamp_end} ORDER BY b.tstamp DESC LIMIT 500 This query seems to run really slow. Apologies for the crap naming - I've been asked to not reveal the actual table names. The reason there is a sub select is because the outer select gets one row from the table a and it gets a row from table b. But also need to know the latest author_id from table b as well, so I run a subselect to return that one. I don't want to run another select inside a php loop - as that is also inefficient. It works correctly - I just need to find a much faster way of getting this data set.

Search Results

Search found 90546 results on 3622 pages for 'code optimization'.

Page 17/3622 | < Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24 | Next Page >

- by Quantum7

- by solinent

- by Ricardo Peres

- by user828584

- by JamesM-SiteGen

- by Michal B.

- by Rob W

- by Val Nolav

- by user342079

- by Budda

- by mat_geek

- by Jeff Mercado

- by bitLost

- by morpheous

- by S_H

- by xApple

- by Jacob Stanley

- by Goz

- by SigTerm

- by Tom

- by Aerovistae

- by rano

- by Soubok

- by rohitgu

- by Matt

< Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24 | Next Page >