faster - Page 156 - Developer IT

Creating C++ client app for some abstract windows server - how to manage TCP connection to server speed?

- by Kabumbus

So we have some server with some address port and ip. we are developing that server so we can implement on it what ever we need for help. What are standard/best practices for data transfer speed management between C++ windows client app and server (C++)? My main point is in how to get how much data can be uploaded/downloaded from/to client via his low speed network to my relatively super fast server. (I need it for set up of his live stream Audio/Video bit rate) My try on explaining number 3. We do not care how fast is our server. It is always faster than needed. We care about client tyring to stream out to our server his media. he streams encoded (via ffmpeg) live video data to our server. But he has say ADSL with 500kb/s of outgoing traffic. Also he uses some ICQ or what so ever so he has less than 500 kb/s per second. And he wants to stream live video! So we need to set up our ffmpeg to encode video with respect to the bit rate user can provide. We develop server side and client side. We need a way of finding out how much user can upload per second currently (so value can change dynamically over time)

Read the article

Using Memcached in Python/Django - questions.

- by Thomas

I am starting use Memcached to make my website faster. For constant data in my database I use this: from django.core.cache import cache cache_key = 'regions' regions = cache.get(cache_key) if result is None: """Not Found in Cache""" regions = Regions.objects.all() cache.set(cache_key, regions, 2592000) #(2592000sekund = 30 dni) return regions For seldom changed data I use signals: from django.core.cache import cache from django.db.models import signals def nuke_social_network_cache(self, instance, **kwargs): cache_key = 'networks_for_%s' % (self.instance.user_id,) cache.delete(cache_key) signals.post_save.connect(nuke_social_network_cache, sender=SocialNetworkProfile) signals.post_delete.connect(nuke_social_network_cache, sender=SocialNetworkProfile) Is it correct way? I installed django-memcached-0.1.2, which show me: Memcached Server Stats Server Keys Hits Gets Hit_Rate Traffic_In Traffic_Out Usage Uptime 127.0.0.1 15 220 276 79% 83.1 KB 364.1 KB 18.4 KB 22:21:25 Can sombody explain what columns means? And last question. I have templates where I am getting much records from a few table (relationships). So in my view I get records from one table and in templates show it and related info from others. Generating page last a few seconds for very small table (<100records). Is it some easy way to cache queries from templates? Have I to do some big structure in my view (with all related tables), cache it and send to template?

Read the article

Any utility to test expand C/C++ #define macros?

- by Randy

It seems I often spend way too much time trying to get a #define macro to do exactly what i want. I'll post my current delemia below and any help is appreciated. But really the bigger question is whether there is any utility someone could reccomend, to quickly display what a macro is actually doing? It seems like even the slow trial and error process would go much faster if I could see what is wrong. Currently, I'm dynamically loading a long list of functions from a DLL I made. The way I've set things up, the function pointers have the same nanes as the exported functions, and the typedef(s) used to prototyp them have the same names, but with a prepended underscor. So I want to use a define to simplfy assignments of a long long list of function pointers. For example, In the code statement below, 'hexdump' is the name of a typdef'd function point, and is also the name of the function, while _hexdump is the name of the typedef. If GetProcAddress() fails, a failure counter in incremented. if (!(hexdump = (_hexdump)GetProcAddress(h, "hexdump"))) --iFail; So lets say I'd like to rplace each line like the above with a macro, like this... GETADDR_FOR(hexdump ) Well this is the best I've come up with so far. It doesn't work (my // comment is just to prevent text formatting in the message)... // #define GETADDR_FOR(a) if (!(a = (#_#a)GetProcAddress(h, "/""#a"/""))) --iFail; And again, while I'd APPRECIATE an insight into what silly mistake I've made, it would make my day to have a utility that would show me the error of my ways, by simply plugging in my macro

Read the article

Currently using View, Should I use a hard table instead?

- by 1001010101

Read the article

How to hold a queue of messages and have a group of working threads without polling?

- by Mark

I have a workflow that I want to looks something like this: / Worker 1 \ =Request Channel= - [Holding Queue|||] - Worker 2 - =Response Channel= \ Worker 3 / That is: Requests come in and they enter a FIFO queue Identical workers then pick up tasks from the queue At any given time any worker may work only one task When a worker is free and the holding queue is non-empty the worker should immediately pick up another task When tasks are complete, a worker places the result on the Response Channel I know there are QueueChannels in Spring Integration, but these channels require polling (which seems suboptimal). In particular, if a worker can be busy, I'd like the worker to be busy. Also, I've considered avoiding the queue altogether and simply letting tasks round-robin to all workers, but it's preferable to have a single waiting line as some tasks may be accomplished faster than others. Furthermore, I'd like insight into how many jobs are remaining (which I can get from the queue) and the ability to cancel all or particular jobs. How can I implement this message queuing/work distribution pattern while avoiding a polling? Edit: It appears I'm looking for the Message Dispatcher pattern -- how can I implement this using Spring/Spring Integration?

Read the article

Slightly different execution times between python2 and python3

- by user557634

Hi. Lastly I wrote a simple generator of permutations in python (implementation of "plain changes" algorithm described by Knuth in "The Art... 4"). I was curious about the differences in execution time of it between python2 and python3. Here is my function: def perms(s): s = tuple(s) N = len(s) if N <= 1: yield s[:] raise StopIteration() for x in perms(s[1:]): for i in range(0,N): yield x[:i] + (s[0],) + x[i:] I tested both using timeit module. My tests: $ echo "python2.6:" && ./testing.py && echo "python3:" && ./testing3.py python2.6: args time[ms] 1 0.003811 2 0.008268 3 0.015907 4 0.042646 5 0.166755 6 0.908796 7 6.117996 8 48.346996 9 433.928967 10 4379.904032 python3: args time[ms] 1 0.00246778964996 2 0.00656183719635 3 0.01419159912 4 0.0406293644678 5 0.165960511097 6 0.923101452814 7 6.24257639835 8 53.0099868774 9 454.540967941 10 4585.83498001 As you can see, for number of arguments less than 6, python 3 is faster, but then roles are reversed and python2.6 does better. As I am a novice in python programming, I wonder why is that so? Or maybe my script is more optimized for python2? Thank you in advance for kind answer :)

Read the article

Multithreaded linked list traversal

- by Rob Bryce

Given a (doubly) linked list of objects (C++), I have an operation that I would like multithread, to perform on each object. The cost of the operation is not uniform for each object. The linked list is the preferred storage for this set of objects for a variety of reasons. The 1st element in each object is the pointer to the next object; the 2nd element is the previous object in the list. I have solved the problem by building an array of nodes, and applying OpenMP. This gave decent performance. I then switched to my own threading routines (based off Windows primitives) and by using InterlockedIncrement() (acting on the index into the array), I can achieve higher overall CPU utilization and faster through-put. Essentially, the threads work by "leap-frog'ing" along the elements. My next approach to optimization is to try to eliminate creating/reusing the array of elements in my linked list. However, I'd like to continue with this "leap-frog" approach and somehow use some nonexistent routine that could be called "InterlockedCompareDereference" - to atomically compare against NULL (end of list) and conditionally dereference & store, returning the dereferenced value. I don't think InterlockedCompareExchangePointer() will work since I cannot atomically dereference the pointer and call this Interlocked() method. I've done some reading and others are suggesting critical sections or spin-locks. Critical sections seem heavy-weight here. I'm tempted to try spin-locks but I thought I'd first pose the question here and ask what other people are doing. I'm not convinced that the InterlockedCompareExchangePointer() method itself could be used like a spin-lock. Then one also has to consider acquire/release/fence semantics... Ideas? Thanks!

Read the article

Creating an object in the loop

- by Jacob

std::vector<double> C(4); for(int i = 0; i < 1000;++i) for(int j = 0; j < 2000; ++j) { C[0] = 1.0; C[1] = 1.0; C[2] = 1.0; C[3] = 1.0; } is much faster than for(int i = 0; i < 1000;++i) for(int j = 0; j < 2000; ++j) { std::vector<double> C(4); C[0] = 1.0; C[1] = 1.0; C[2] = 1.0; C[3] = 1.0; } I realize this happens because std::vector is repeatedly being created and instantiated in the loop, but I was under the impression this would be optimized away. Is it completely wrong to keep variables local in a loop whenever possible? I was under the (perhaps false) impression that this would provide optimization opportunities for the compiler. EDIT: I use VC++2005 (release mode) with full optimization (/Ox)

Read the article

Is this linear search implementation actually useful?

- by Helper Method

In Matters Computational I found this interesting linear search implementation (it's actually my Java implementation ;-)): public static int linearSearch(int[] a, int key) { int high = a.length - 1; int tmp = a[high]; // put a sentinel at the end of the array a[high] = key; int i = 0; while (a[i] != key) { i++; } // restore original value a[high] = tmp; if (i == high && key != tmp) { return NOT_CONTAINED; } return i; } It basically uses a sentinel, which is the searched for value, so that you always find the value and don't have to check for array boundaries. The last element is stored in a temp variable, and then the sentinel is placed at the last position. When the value is found (remember, it is always found due to the sentinel), the original element is restored and the index is checked if it represents the last index and is unequal to the searched for value. If that's the case, -1 (NOT_CONTAINED) is returned, otherwise the index. While I found this implementation really clever, I wonder if it is actually useful. For small arrays, it seems to be always slower, and for large arrays it only seems to be faster when the value is not found. Any ideas?

Read the article

Very simple python functions takes spends long time in function and not subfunctions

- by John Salvatier

I have spent many hours trying to figure what is going on here. The function 'grad_logp' in the code below is called many times in my program, and cProfile and runsnakerun the visualize the results reveals that the function grad_logp spends about .00004s 'locally' every call not in any functions it calls and the function 'n' spends about .00006s locally every call. Together these two times make up about 30% of program time that I care about. It doesn't seem like this is function overhead as other python functions spend far less time 'locally' and merging 'grad_logp' and 'n' does not make my program faster, but the operations that these two functions do seem rather trivial. Does anyone have any suggestions on what might be happening? Have I done something obviously inefficient? Am I misunderstanding how cProfile works? def grad_logp(self, variable, calculation_set ): p = params(self.p,self.parents) return self.n(variable, self.p) def n (self, variable, p ): gradient = self.gg(variable, p) return np.reshape(gradient, np.shape(variable.value)) def gg(self, variable, p): if variable is self: gradient = self._grad_logps['x']( x = self.value, **p) else: gradient = __builtin__.sum([self._pgradient(variable, parameter, value, p) for parameter, value in self.parents.iteritems()]) return gradient

Read the article

Approach for caching data from data logger

- by filip-fku

Greetings, I've been working on a C#.NET app that interacts with a data logger. The user can query and obtain logs for a specified time period, and view plots of the data. Typically a new data log is created every minute and stores a measurement for a few parameters. To get meaningful information out of the logger, a reasonable number of logs need to be acquired - data for at least a few days. The hardware interface is a UART to USB module on the device, which restricts transfers to a maximum of about 30 logs/second. This becomes quite slow when reading in the data acquired over a number of days/weeks. What I would like to do is improve the perceived performance for the user. I realize that with the hardware speed limitation the user will have to wait for the full download cycle at least the first time they acquire a larger set of data. My goal is to cache all data seen by the app, so that it can be obtained faster if ever requested again. The approach I have been considering is to use a light database, like SqlServerCe, that can store the data logs as they are received. I am then hoping to first search the cache prior to querying a device for logs. The cache would be updated with any logs obtained by the request that were not already cached. Finally my question - would you consider this to be a good approach? Are there any better alternatives you can think of? I've tried to search SO and Google for reinforcement of the idea, but I mostly run into discussions of web request/content caching. Thanks for any feedback!

Read the article

Web Audio API and mobile browsers

- by Michael

I've run into a problem while implementing sound and music into an HTML game that I'm building. I'm using the Web Audio API, loading all the sound files with XMLHttpRequests and decoding them into an AudioBufferSourceNode with AudioContext.prototype.decodeAudioData(). It looks something like this: var request = new XMLHttpRequest(); request.open("GET", "soundfile.ogg", true); request.responseType = "arraybuffer"; request.onload = function() { context.decodeAudioData(request.response) } request.send(); Everything plays fine, but on mobile the decodeAudioData takes an absurdly long time for the background music. I then tried using AudioContext.prototype.createMediaElementSource() to load the music from an HTML Audio object, since they support streaming and don't have to load the whole file into memory at once. It looked something like this: var audio = new Audio('soundfile.ogg'); var source = context.createMediaElementSource(audio); var mainVolume = context.createGain(); source.connect(mainVolume); mainVolume.connect(context.destination); This loads much faster, but the audio volume isn't affected by the gain node. Works fine on desktop, so I'm assuming this is a bug/limitation of mobile Chrome (testing on Android). Is there actually no good, well-performing way to handle sound on mobile browsers or am just I doing something stupid?

Read the article

g-wan - reproducing the performance claims

- by user2603628

Using gwan_linux64-bit.tar.bz2 under Ubuntu 12.04 LTS unpacking and running gwan then pointing wrk at it (using a null file null.html) wrk --timeout 10 -t 2 -c 100 -d20s http://127.0.0.1:8080/null.html Running 20s test @ http://127.0.0.1:8080/null.html 2 threads and 100 connections Thread Stats Avg Stdev Max +/- Stdev Latency 11.65s 5.10s 13.89s 83.91% Req/Sec 3.33k 3.65k 12.33k 75.19% 125067 requests in 20.01s, 32.08MB read Socket errors: connect 0, read 37, write 0, timeout 49 Requests/sec: 6251.46 Transfer/sec: 1.60MB .. very poor performance, in fact there seems to be some kind of huge latency issue. During the test gwan is 200% busy and wrk is 67% busy. Pointing at nginx, wrk is 200% busy and nginx is 45% busy: wrk --timeout 10 -t 2 -c 100 -d20s http://127.0.0.1/null.html Thread Stats Avg Stdev Max +/- Stdev Latency 371.81us 134.05us 24.04ms 91.26% Req/Sec 72.75k 7.38k 109.22k 68.21% 2740883 requests in 20.00s, 540.95MB read Requests/sec: 137046.70 Transfer/sec: 27.05MB Pointing weighttpd at nginx gives even faster results: /usr/local/bin/weighttp -k -n 2000000 -c 500 -t 3 http://127.0.0.1/null.html weighttp - a lightweight and simple webserver benchmarking tool starting benchmark... spawning thread #1: 167 concurrent requests, 666667 total requests spawning thread #2: 167 concurrent requests, 666667 total requests spawning thread #3: 166 concurrent requests, 666666 total requests progress: 9% done progress: 19% done progress: 29% done progress: 39% done progress: 49% done progress: 59% done progress: 69% done progress: 79% done progress: 89% done progress: 99% done finished in 7 sec, 13 millisec and 293 microsec, 285172 req/s, 57633 kbyte/s requests: 2000000 total, 2000000 started, 2000000 done, 2000000 succeeded, 0 failed, 0 errored status codes: 2000000 2xx, 0 3xx, 0 4xx, 0 5xx traffic: 413901205 bytes total, 413901205 bytes http, 0 bytes data The server is a virtual 8 core dedicated server (bare metal), under KVM Where do I start looking to identify the problem gwan is having on this platform ? I have tested lighttpd, nginx and node.js on this same OS, and the results are all as one would expect. The server has been tuned in the usual way with expanded ephemeral ports, increased ulimits, adjusted time wait recycling etc.

Read the article

Simple question about the lunarlander example.

- by Smills

I am basing my game off the lunarlander example. This is the run loop I am using (very similar to what is used in lunarlander). I am getting considerable performance issues associated with my drawing, even if I draw almost nothing. I noticed the below method. Why is the canvas being created and set to null each cycle? @Override public void run() { while (mRun) { Canvas c = null; try { c = mSurfaceHolder.lockCanvas();//null synchronized (mSurfaceHolder) { updatePhysics(); doDraw(c); } } finally { // do this in a finally so that if an exception is thrown // during the above, we don't leave the Surface in an // inconsistent state if (c != null) { mSurfaceHolder.unlockCanvasAndPost(c); } } } } Most of the times I have read anything about canvases it is more along the lines of: mField = new Bitmap(...dimensions...); Canvas c = new Canvas(mField); My question is: why is Google's example done that way (null canvas), what are the benefits of this, and is there a faster way to do it?

Read the article

Better language or checking tool?

- by rwallace

This is primarily aimed at programmers who use unmanaged languages like C and C++ in preference to managed languages, forgoing some forms of error checking to obtain benefits like the ability to work in extremely resource constrained systems or the last increment of performance, though I would also be interested in answers from those who use managed languages. Which of the following would be of most value? A language that would optionally compile to CLR byte code or to machine code via C, and would provide things like optional array bounds checking, more support for memory management in environments where you can't use garbage collection, and faster compile times than typical C++ projects. (Think e.g. Ada or Eiffel with Python syntax.) A tool that would take existing C code and perform static analysis to look for things like potential null pointer dereferences and array overflows. (Think e.g. an open source equivalent to Coverity.) Something else I haven't thought of. Or put another way, when you're using C family languages, is the top of your wish list more expressiveness, better error checking or something else? The reason I'm asking is that I have a design and prototype parser for #1, and an outline design for #2, and I'm wondering which would be the better use of resources to work on after my current project is up and running; but I think the answers may be useful for other tools programmers also. (As usual with questions of this nature, if the answer you would give is already there, please upvote it.)

Read the article

Fast path cache generation for a connected node graph

- by Sukasa

I'm trying to get a faster pathfinding mechanism in place in a game I'm working on for a connected node graph. The nodes are classed into two types, "Networks" and "Routers." In this picture, the blue circles represent routers and the grey rectangles networks. Each network keeps a list of which routers it is connected to, and vice-versa. Routers cannot connect directly to other routers, and networks cannot connect directly to other networks. Networks list which routers they're connected to Routers do the same I need to get an algorithm that will map out a path, measured in the number of networks crossed, for each possible source and destination network excluding paths where the source and destination are the same network. I have one right now, however it is unusably slow, taking about two seconds to map the paths, which becomes incredibly noticeable for all connected players. The current algorithm is a depth-first brute-force search (It was thrown together in about an hour to just get the path caching working) which returns an array of networks in the order they are traversed, which explains why it's so slow. Are there any algorithms that are more efficient? As a side note, while these example graphs have four networks, the in-practice graphs have 55 networks and about 20 routers in use. Paths which are not possible also can occur, and as well at any time the network/router graph topography can change, requiring the path cache to be rebuilt. What approach/algorithm would likely provide the best results for this type of a graph?

Read the article

byte + byte = int... why?

- by Robert C. Cartaino

Looking at this C# code... byte x = 1; byte y = 2; byte z = x + y; // ERROR: Cannot implicitly convert type 'int' to 'byte' The result of any math performed on byte (or short) types is implicitly cast back to an integer. The solution is to explicitly cast the result back to a byte, so... byte z = (byte)(x + y); // works What I am wondering is why? Is it architectural? Philosophical? We have: int + int = int long + long = long float + float = float double + double = double So why not: byte + byte = byte short + short = short ? A bit of background: I am performing a long list of calculations on "small numbers" (i.e. < 8) and storing the intermediate results in a large array. Using a byte array (instead of an int array) is faster (because of cache hits). But the extensive byte-casts spread through the code make it that much more unreadable.

Read the article

Batch insert mode with hibernate and oracle: seems to be dropping back to slow mode silently

- by Chris

I'm trying to get a batch insert working with Hibernate into Oracle, according to what i've read here: http://docs.jboss.org/hibernate/core/3.3/reference/en/html/batch.html , but with my benchmarking it doesn't seem any faster than before. Can anyone suggest a way to prove whether hibernate is using batch mode or not? I hear that there are numerous reasons why it may silently drop into normal mode (eg associations and generated ids) so is there some way to find out why it has gone non-batch? My hibernate.cfg.xml contains this line which i believe is all i need to enable batch mode: <property name="jdbc.batch_size">50</property> My insert code looks like this: List<LogEntry> entries = ..a list of 100 LogEntry data classes... Session sess = sessionFactory.getCurrentSession(); for(LogEntry e : entries) { sess.save(e); } sess.flush(); sess.clear(); My 'logentry' class has no associations, the only interesting field is the id: @Entity @Table(name="log_entries") public class LogEntry { @Id @GeneratedValue public Long id; ..other fields - strings and ints... However, since it is oracle, i believe the @GeneratedValue will use the sequence generator. And i believe that only the 'identity' generator will stop bulk inserts. So if anyone can explain why it isn't running in batch mode, or how i can find out for sure if it is or isn't in batch mode, or find out why hibernate is silently dropping back to slow mode, i'd be most grateful. Thanks

Read the article

Jquery Mobile app focus-based navigation stops working after switching between pages

- by nawar

As much as I would like to expand on the details here, I am not able to find relevant information about the root cause of this problem. I am having this issue with my blackberry Webapp which I built using JQM. After few times of navigation from page to page, the application becomes unresponsive on the destination page and I am not able to scroll up/down using the touchpad. If someone had this problem or some clue to the resolution, then that would be helpful. Edit: after doing some research I was able to narrow down the cause of the issue. I am having an issue with focus-based navigation. As I lose focus on the page elements (buttons, input fields, etc) after few transitions among the pages. Edit I had to switch back to the cursor based navigation as it is much faster and do not have the issue faced by focus-based navigation. I removed the entry: <rim:navigation mode=”focus”/> from the config.xml file I found this entry on the blackberry fourms but it haven't solved my problem despite the fact I upgraded my WebWorks SDK to 2.0 from 1.5 http://supportforums.blackberry.com/t5/Web-and-WebWorks-Development/Focus-based-navigation-hangs-device/td-p/455600 Thanks

Read the article

Why does this log output show the same answer in each iteration?

- by Will Hancock

OK, I was reading an article on optimising JS for Googles V8 engine, when i saw this code example... I nearly skimmed over it, but then I saw this; |=; a[0] |= b; a = new Array(); a[0] = 0; for (var b = 0; b < 10; b++) { console.log(a, b) a[0] |= b; // Much better! 2x faster. } a[0] |= b; So I ran it, in my console, with a console.log in the loop and resulted in 15; [15] 0 [15] 1 [15] 2 [15] 3 [15] 4 [15] 5 [15] 6 [15] 7 [15] 8 [15] 9 WHAT?!?! Where the hell does it get 15 from, on every iteration?!?!?! I've been a web dev for 7 years, and this has stumped me and a fellow colleague. Can somebody talk me through this code? Cheers.

Read the article

Preloading Image Bug in IE6-8

- by Kevin C.

Page in question: http://phwsinc.com/our-work/one-rincon-hill.asp In IE6-8, when you click the left-most thumbnail in the gallery, the image never loads. If you click the thumbnail a second time, then it will load. I'm using jQuery, and here's my code that's powering the gallery: $(document).ready(function() { // PROJECT PHOTO GALLERY var thumbs = $('.thumbs li a'); var photoWrapper = $('div.photoWrapper'); if (thumbs.length) { thumbs.click( function(){ photoWrapper.addClass('loading'); var img_src = $(this).attr('href'); // The two lines below are what cause the bug in IE. They make the gallery run much faster in other browsers, though. var new_img = new Image(); new_img.src = img_src; var photo = $('#photo'); photo.fadeOut('slow', function() { photo.attr('src', img_src); photo.load(function() { photoWrapper.removeClass('loading'); photo.fadeIn('slow'); }); }); return false; }); } }); A coworker told me that he's always had problems with the js Image() object, and advised me to just append an <img /> element inside of a div set to display:none;, but that's a little messy for my tastes--I liked using the Image() object, it kept things nice and clean, no unnecessary added HTML markup. Any help would be appreciated. It still works without the image preloading, so if all else fails I'll just wrap the preloading in an if !($.browser.msie){ } and call it a day.

Read the article

Division of text follow the cursor via Javascript/Jquery

- by webzide

Dear experts, I wanted have a dynamic division of content follow you with the cursor in the web browser space. I am not a pro at JS so I spent 2 hours to finally debugged a very stupid way to accomplish this. $(document).ready(function () { function func(evt) { var evt = (evt) ? evt : event; var div = document.createElement('div'); div.style.position = "absolute"; div.style.left = evt.clientX + "px"; div.style.top = evt.clientY + "px"; $(div).attr("id", "current") div.innerHTML = "CURSOR FOLLOW TEXT"; $(this).append($(div)); $(this).unbind() $(this).bind('mousemove', function () { $('div').remove("#current"); }); $(this).bind('mousemove', func); } $("body").bind('mousemove', func) }); As you can see this is pretty much Brute force and it slows down the browser quite a bit. I often experience a lag on the browser as a drag my mouse from one place to another. Is there a way to accomplish this easier and faster and more intuitive. I know you can use the cursor image technique but thats something I'm looking to avoid. Thanks in advance.

Read the article

optimize output value using a class and public member

- by wiso

Suppose you have a function, and you call it a lot of times, every time the function return a big object. I've optimized the problem using a functor that return void, and store the returning value in a public member: #include <vector> const int N = 100; std::vector<double> fun(const std::vector<double> & v, const int n) { std::vector<double> output = v; output[n] *= output[n]; return output; } class F { public: F() : output(N) {}; std::vector<double> output; void operator()(const std::vector<double> & v, const int n) { output = v; output[n] *= n; } }; int main() { std::vector<double> start(N,10.); std::vector<double> end(N); double a; // first solution for (unsigned long int i = 0; i != 10000000; ++i) a = fun(start, 2)[3]; // second solution F f; for (unsigned long int i = 0; i != 10000000; ++i) { f(start, 2); a = f.output[3]; } } Yes, I can use inline or optimize in an other way this problem, but here I want to stress on this problem: with the functor I declare and construct the output variable output only one time, using the function I do that every time it is called. The second solution is two time faster than the first with g++ -O1 or g++ -O2. What do you think about it, is it an ugly optimization?

Read the article

finding N contiguous zero bits in an integer to the left of the MSB from another

- by James Morris

First we find the MSB of the first integer, and then try to find a region of N contiguous zero bits within the second number which is to the left of the MSB from the first integer. Here is the C code for my solution: typedef unsigned int t; unsigned const t_bits = sizeof(t) * CHAR_BIT; _Bool test_fit_within_left_of_msb( unsigned width, t val1, t val2, unsigned* offset_result) { unsigned offbit = 0; unsigned msb = 0; t mask; t b; while(val1 >>= 1) ++msb; while(offbit + width < t_bits - msb) { mask = (((t)1 << width) - 1) << (t_bits - width - offbit); b = val2 & mask; if (!b) { *offset_result = offbit; return true; } if (offbit++) /* this conditional bothers me! */ b <<= offbit - 1; while(b <<= 1) offbit++; } return false; } Aside from faster ways of finding the MSB of the first integer, the commented test for a zero offbit seems a bit extraneous, but necessary to skip the highest bit of type t if it is set. I have also implemented similar algorithms but working to the right of the MSB of the first number, so they don't require this seemingly extra condition. How can I get rid of this extra condition, or even, are there far more optimal solutions?

Read the article

Creating rapid function overloads in C++

- by DeadMG

template<typename Functor, typename Return, typename Arg1, typename Arg2, typename Arg3, typename Arg4, typename Arg5, typename Arg6, typename Arg7, typename Arg8, typename Arg9, typename Arg10> class LambdaCall : public Instruction { public: LambdaCall(Functor func ,unsigned char constructorarg1 ,unsigned char constructorarg2 ,unsigned char constructorarg3 ,unsigned char constructorarg4 ,unsigned char constructorarg5 ,unsigned char constructorarg6 ,unsigned char constructorarg7 ,unsigned char constructorarg8 ,unsigned char constructorarg9 ,unsigned char constructorarg10) : arg1(constructorarg1) , arg2(constructorarg2) , arg3(constructorarg3) , arg4(constructorarg4) , arg5(constructorarg5) , arg6(constructorarg6) , arg7(constructorarg7) , arg8(constructorarg8) , arg9(constructorarg9) , arg10(constructorarg10) , function(func) {} void Call(State& state) { state.Push<Return>(func(*state.GetRegisterValue<Arg1>(arg1) ,*state.GetRegisterValue<Arg1>(arg1) ,*state.GetRegisterValue<Arg2>(arg2) ,*state.GetRegisterValue<Arg3>(arg3) ,*state.GetRegisterValue<Arg4>(arg4) ,*state.GetRegisterValue<Arg5>(arg5) ,*state.GetRegisterValue<Arg6>(arg6) ,*state.GetRegisterValue<Arg7>(arg7) ,*state.GetRegisterValue<Arg8>(arg8) ,*state.GetRegisterValue<Arg9>(arg9) ,*state.GetRegisterValue<Arg10>(arg10) )); } Functor function; unsigned char arg1; unsigned char arg2; unsigned char arg3; unsigned char arg4; unsigned char arg5; unsigned char arg6; unsigned char arg7; unsigned char arg8; unsigned char arg9; unsigned char arg10; }; Then again for every possible number of arguments I want to support, and again for void returns. Any way to do this faster?

Search Results

Search found 4580 results on 184 pages for 'faster'.

Page 156/184 | < Previous Page | 152 153 154 155 156 157 158 159 160 161 162 163 | Next Page >

- by Kabumbus

- by Thomas

- by Randy

- by 1001010101

- by Mark

- by user557634

- by Rob Bryce

- by Jacob

- by Helper Method

- by John Salvatier

- by filip-fku

- by Michael

- by user2603628

- by Smills

- by rwallace

- by Sukasa

- by Robert C. Cartaino

- by Chris

- by nawar

- by Will Hancock

- by Kevin C.

- by webzide

- by wiso

- by James Morris

- by DeadMG

< Previous Page | 152 153 154 155 156 157 158 159 160 161 162 163 | Next Page >