Search Results

Search found 740 results on 30 pages for 'processors'.

Page 25/30 | < Previous Page | 21 22 23 24 25 26 27 28 29 30  | Next Page >

  • Compiler optimization causing the performance to slow down

    - by aJ
    I have one strange problem. I have following piece of code: template<clss index, class policy> inline int CBase<index,policy>::func(const A& test_in, int* srcPtr ,int* dstPtr) { int width = test_in.width(); int height = test_in.height(); double d = 0.0; //here is the problem for(int y = 0; y < height; y++) { //Pointer initializations //multiplication involving y //ex: int z = someBigNumber*y + someOtherBigNumber; for(int x = 0; x < width; x++) { //multiplication involving x //ex: int z = someBigNumber*x + someOtherBigNumber; if(soemCondition) { // floating point calculations } *dstPtr++ = array[*srcPtr++]; } } } The inner loop gets executed nearly 200,000 times and the entire function takes 100 ms for completion. ( profiled using AQTimer) I found an unused variable double d = 0.0; outside the outer loop and removed the same. After this change, suddenly the method is taking 500ms for the same number of executions. ( 5 times slower). This behavior is reproducible in different machines with different processor types. (Core2, dualcore processors). I am using VC6 compiler with optimization level O2. Follwing are the other compiler options used : -MD -O2 -Z7 -GR -GX -G5 -X -GF -EHa I suspected compiler optimizations and removed the compiler optimization /O2. After that function became normal and it is taking 100ms as old code. Could anyone throw some light on this strange behavior? Why compiler optimization should slow down performance when I remove unused variable ? Note: The assembly code (before and after the change) looked same.

    Read the article

  • What do I need for development for an ARM processor?

    - by claws
    Hello, I'm familiar with X86[-64] architecture & assembly. I want to start develop for an ARM processor. But unlike desktop processors, I don't have an actual ARM processor. I think I need an ARM simulator. http://www.armtutorial.com/ say An ARM assembly compiler will be required, the most accessible is the ARMulator. I thought of downloading Armulator but found from http://forums.arm.com/index.php?showtopic=13744 that Its not sold seperately. But you can download an eval of RVDS - which includes RVISS/ARMulator I've downloaded & installed RVDS but It looks very complex. I'm unable to figure out what do I need to do to write ARM assembly & run it. I want to write in assembly not in C/C++. I don't have an ARM processor. What is a good simulator? Can any one please mention in short. How to write assembly & assemble & simulate using RVDS. Please be clear? Are there any other alternative ways. I can't afford buying any kind of boards. I always learn from books rather than tutorials. I'm following these two books: ARM System Developer's Guide: Designing and Optimizing System Software (The Morgan Kaufmann Series in Computer Architecture and Design) ARM System-on-Chip Architecture (2nd Edition) Do you have any better suggestions?

    Read the article

  • PInvokeStackImbalance was detected when manually running Xna Content pipeline

    - by Miau
    So I m running this code static readonly string[] PipelineAssemblies = { "Microsoft.Xna.Framework.Content.Pipeline.FBXImporter" + XnaVersion, "Microsoft.Xna.Framework.Content.Pipeline.XImporter" + XnaVersion, "Microsoft.Xna.Framework.Content.Pipeline.TextureImporter" + XnaVersion, "Microsoft.Xna.Framework.Content.Pipeline.EffectImporter" + XnaVersion, "Microsoft.Xna.Framework.Content.Pipeline.XImporter" + XnaVersion, "Microsoft.Xna.Framework.Content.Pipeline.AudioImporters" + XnaVersion, "Microsoft.Xna.Framework.Content.Pipeline.VideoImporters" + XnaVersion, }; more code in between ..... // Register any custom importers or processors. foreach (string pipelineAssembly in PipelineAssemblies) { _buildProject.AddItem("Reference", pipelineAssembly); } more code in between ..... var execute = Task.Factory.StartNew(() => submission.ExecuteAsync(null, null), cancellationTokenSource.Token); var endBuild = execute.ContinueWith(ant => BuildManager.DefaultBuildManager.EndBuild()); endBuild.Wait(); Basically trying to build the content project with code ... this works well if you remove XImporter, AudioIMporters and VideoImporters but with those I get the following error: PInvokeStackImbalance was detected Message: A call to PInvoke function 'Microsoft.Xna.Framework.Content.Pipeline!Microsoft.Xna.Framework.Content.Pipeline.UnsafeNativeMethods+AudioHelper::GetFormatSize' has unbalanced the stack. This is likely because the managed PInvoke signature does not match the unmanaged target signature. Check that the calling convention and parameters of the PInvoke signature match the target unmanaged signature. Things I've tried: turn unsafe code on adding a lot of logging (the dll is found in case you are wondering) different wav and Mp3 files (just in case) Will update...

    Read the article

  • Subscription Based Billing

    - by regex
    Hello All, I'm putting together a small start up company which will be set up with a subscription based billing model. The bill will go to customers on either a monthly or quarterly basis depending on the end user's preference. My question is two parted: I'm new to online billing and I'm only really aware of Pay Pal when it comes to third party bill payment, but this seems more like a check out system. I'm sure there are better alternatives than PayPal for third party billing processors (I have tried Googling for them, but I'm having trouble finding exactly what I'm looking for). What options (companies) are available for third party payment processing and what types of experiences (good or bad) have you had with them? We would like to give our customers the ability to set up recurring payments. I'd rather not store a customer's credit card number on our database as I imagine there are a plethora of compliance guidelines around this. Is there a third party solution for recurring payment processing? On a side note, this is not necessarily a code related question and is more business focused. I wasn't sure if there was a better route for posting this question, and please commont or modify this if there is another route I should take. Thanks!!

    Read the article

  • How to compile a C DLL for 64 bit with Visual Studio 2010?

    - by Daren Thomas
    I have a DLL written in C in source code. This is the code for the General Polygon Clipper (in case you are interested). I'm using it in a C# project via the C# wrapper provided on the homepage. This comes with a precompiled DLL. Since switching to a 64bit Development machine with Visual Studio 2010 and Windows 7 64 bit, the application won't run anymore. This is the error I get: An attempt was made to load a program with an incorrect format. This is because of DLLImporting the 32bit gpc.dll, as I have gathered from stuff found on the web. I assume this will all go away if I recompile the DLL to 64bit, but can't for the love of me figure out how to do so. My C skills are basic, in that I can write a C program with the GNU tools, but have no experience with various compilers / processors / IDEs etc. I believe I could port this to C#. By that I mean I trust myself to actually pull it off. But I'd prefer not to, since it is a lot of work that I'd prefer a compiler to do for me ;)

    Read the article

  • Bare-metal virtualisation for the desktop

    - by Andrew Taylor
    Hi, Does anyone have any knowledge about bare-metal virtualisation products? I'm interested in building a new desktop machine for home, I've been looking at the Intel Quad Core processors and I'd like to put 8GB of RAM in there, but, it got me thinking about making the most out of the available resources. I thought if I could get a good 64bit machine, put some bare-metal virtualisation on, then have a primary system, I'd also be able to bring up some extra virtualised systems as and when I needed. I know most of the bare metal systems are designed for the server market, but, is there anything out there that works well for a desktop. What are the caveats? I presume I won't be able to make the most out of any video cards I could buy, what about just getting a decent screen resolution, will this be a problem? I run a single 24" screen. What about DVD/CD writing, is this possible? I'd like to re-rip my CD collection, I was hoping the quad 64Bit goodness would help me out with the encoding. I currently use a Mac and couldn't go back to windows so that leaves Linux, I was thinking a primary OS of ubuntu. Does this make a difference? Thanks Andrew

    Read the article

  • Why is my multithreaded Java program not maxing out all my cores on my machine?

    - by James B
    Hi, I have a program that starts up and creates an in-memory data model and then creates a (command-line-specified) number of threads to run several string checking algorithms against an input set and that data model. The work is divided amongst the threads along the input set of strings, and then each thread iterates the same in-memory data model instance (which is never updated again, so there are no synchronization issues). I'm running this on a Windows 2003 64-bit server with 2 quadcore processors, and from looking at Windows task Manager they aren't being maxed-out, (nor are they looking like they are being particularly taxed) when I run with 10 threads. Is this normal behaviour? It appears that 7 threads all complete a similar amount of work in a similar amount of time, so would you recommend running with 7 threads instead? Should I run it with more threads?...Although I assume this could be detrimental as the JVM will do more context switching between the threads. Alternatively, should I run it with fewer threads? Alternatively, what would be the best tool I could use to measure this?...Would a profiling tool help me out here - indeed, is one of the several profilers better at detecting bottlenecks (assuming I have one here) than the rest? Note, the server is also running SQL Server 2005 (this may or may not be relevant), but nothing much is happening on that database when I am running my program. Note also, the threads are only doing string matching, they aren't doing any I/O or database work or anything else they may need to wait on. Thanks in advance, -James

    Read the article

  • How to invalidate cache when benchmarking?

    - by Michael Buen
    I have this code, that when swapping the order of UsingAs and UsingCast, their performance also swaps. using System; using System.Diagnostics; using System.Linq; using System.IO; class Test { const int Size = 30000000; static void Main() { object[] values = new MemoryStream[Size]; UsingAs(values); UsingCast(values); Console.ReadLine(); } static void UsingCast(object[] values) { Stopwatch sw = Stopwatch.StartNew(); int sum = 0; foreach (object o in values) { if (o is MemoryStream) { var m = (MemoryStream)o; sum += (int)m.Length; } } sw.Stop(); Console.WriteLine("Cast: {0} : {1}", sum, (long)sw.ElapsedMilliseconds); } static void UsingAs(object[] values) { Stopwatch sw = Stopwatch.StartNew(); int sum = 0; foreach (object o in values) { if (o is MemoryStream) { var m = o as MemoryStream; sum += (int)m.Length; } } sw.Stop(); Console.WriteLine("As: {0} : {1}", sum, (long)sw.ElapsedMilliseconds); } } Outputs: As: 0 : 322 Cast: 0 : 281 When doing this... UsingCast(values); UsingAs(values); ...Results to this: Cast: 0 : 322 As: 0 : 281 When doing just this... UsingAs(values); ...Results to this: As: 0 : 322 When doing just this: UsingCast(values); ...Results to this: Cast: 0 : 322 Aside from running them independently, how to invalidate the cache so the second code being benchmarked won't receive the cached memory of first code? Benchmarking aside, just loved the fact that modern processors do this caching magic :-)

    Read the article

  • ExecutorService - scaling

    - by Stanciu Alexandru-Marian
    I am trying to write a program in Java using ExecutorService and it's function invokeAll. My question is: does the invokeAll functions solve the tasks simultaneously? I mean, if i have two processors, there will be two workers in the same time? Because a can't make it to scale correct. It takes the same time to complete the problem if i give newFixedThreadPool(2) or 1. List<Future<PartialSolution>> list = new ArrayList<Future<PartialSolution>>(); Collection<Callable<PartialSolution>> tasks = new ArrayList<Callable<PartialSolution>>(); for(PartialSolution ps : wp) { tasks.add(new Map(ps, keyWords)); } list = executor.invokeAll(tasks); Map is a class that implements Callable and wp is a vector of Partial Solutions, a class that holds some informations in different times. Why doesn't it scale? What could be the problem? Thank you, Alex

    Read the article

  • Fork in Perl not working inside a while loop reading from file

    - by Sag
    Hi, I'm running a while loop reading each line in a file, and then fork processes with the data of the line to a child. After N lines I want to wait for the child processes to end and continue with the next N lines, etc. It looks something like this: while ($w=<INP>) { # ignore file header if ($w=~m/^\D/) { next;} # get data from line chomp $w; @ws = split(/\s/,$w); $m = int($ws[0]); $d = int($ws[1]); $h = int($ws[2]); # only for some days in the year if (($m==3)and($d==15) or ($m==4)and($d==21) or ($m==7)and($d==18)) { die "could not fork" unless defined (my $pid = fork); unless ($pid) { some instructions here using $m, $d, $h ... } push @qpid,$pid; # when all processors are busy, wait for child processes if ($#qpid==($procs-1)) { for my $pid (@qpid) { waitpid $pid, 0; } reset 'q'; } } } close INP; This is not working. After the first round of processes I get some PID equal to 0, the @qpid array gets mixed up, and the file starts to get read at (apparently) random places, jumping back and forth. The end result is that most lines in the file get read two or three times. Any ideas? Thanks a lot in advance, S.

    Read the article

  • What does flushing thread local memory to global memory mean?

    - by Jack Griffith
    Hi, I am aware that the purpose of volatile variables in Java is that writes to such variables are immediately visible to other threads. I am also aware that one of the effects of a synchronized block is to flush thread-local memory to global memory. I have never fully understood the references to 'thread-local' memory in this context. I understand that data which only exists on the stack is thread-local, but when talking about objects on the heap my understanding becomes hazy. I was hoping that to get comments on the following points: When executing on a machine with multiple processors, does flushing thread-local memory simply refer to the flushing of the CPU cache into RAM? When executing on a uniprocessor machine, does this mean anything at all? If it is possible for the heap to have the same variable at two different memory locations (each accessed by a different thread), under what circumstances would this arise? What implications does this have to garbage collection? How aggressively do VMs do this kind of thing? Overall, I think am trying to understand whether thread-local means memory that is physically accessible by only one CPU or if there is logical thread-local heap partitioning done by the VM? Any links to presentations or documentation would be immensely helpful. I have spent time researching this, and although I have found lots of nice literature, I haven't been able to satisfy my curiosity regarding the different situations & definitions of thread-local memory. Thanks very much.

    Read the article

  • [UNIX] Sort lines of massive file by number of words on line (ideally in parallel)

    - by conradlee
    I am working on a community detection algorithm for analyzing social network data from Facebook. The first task, detecting all cliques in the graph, can be done efficiently in parallel, and leaves me with an output like this: 17118 17136 17392 17064 17093 17376 17118 17136 17356 17318 12345 17118 17136 17356 17283 17007 17059 17116 Each of these lines represents a unique clique (a collection of node ids), and I want to sort these lines in descending order by the number of ids per line. In the case of the example above, here's what the output should look like: 17118 17136 17356 17318 12345 17118 17136 17356 17283 17118 17136 17392 17064 17093 17376 17007 17059 17116 (Ties---i.e., lines with the same number of ids---can be sorted arbitrarily.) What is the most efficient way of sorting these lines. Keep the following points in mind: The file I want to sort could be larger than the physical memory of the machine Most of the machines that I'm running this on have several processors, so a parallel solution would be ideal An ideal solution would just be a shell script (probably using sort), but I'm open to simple solutions in python or perl (or any language, as long as it makes the task simple) This task is in some sense very easy---I'm not just looking for any old solution, but rather for a simple and above all efficient solution

    Read the article

  • Password Cracking in 2010 and Beyond

    - by mttr
    I have looked a bit into cryptography and related matters during the last couple of days and am pretty confused by now. I have a question about password strength and am hoping that someone can clear up my confusion by sharing how they think through the following questions. I am becoming obsessed about these things, but need to spend my time otherwise :-) Let's assume we have an eight-digit password that consists of upper and lower-case alphabetic characters, numbers and common symbols. This means we have 8^96 ~= 7.2 quadrillion different possible passwords. As I understand there are at least two approaches to breaking this password. One is to try a brute-force attack where we try to guess each possible combination of characters. How many passwords can modern processors (in 2010, Core i7 Extreme for eg) guess per second (how many instructions does a single password guess take and why)? My guess would be that it takes a modern processor in the order of years to break such a password. Another approach would consist of obtaining a hash of my password as stored by operating systems and then search for collisions. Depending on the type of hash used, we might get the password a lot quicker than by the bruteforce attack. A number of questions about this: Is the assertion in the above sentence correct? How do I think about the time it takes to find collisions for MD4, MD5, etc. hashes? Where does my Snow Leopard store my password hash and what hashing algorithm does it use? And finally, regardless of the strength of file encryption using AES-128/256, the weak link is still my en/decryption password used. Even if breaking the ciphered text would take longer than the lifetime of the universe, a brute-force attack on my de/encryption password (guess password, then try to decrypt file, try next password...), might succeed a lot earlier than the end of the universe. Is that correct? I would be very grateful, if people could have mercy on me and help me think through these probably simple questions, so that I can get back to work.

    Read the article

  • Access cost of dynamically created objects with dynamically allocated members

    - by user343547
    I'm building an application which will have dynamic allocated objects of type A each with a dynamically allocated member (v) similar to the below class class A { int a; int b; int* v; }; where: The memory for v will be allocated in the constructor. v will be allocated once when an object of type A is created and will never need to be resized. The size of v will vary across all instances of A. The application will potentially have a huge number of such objects and mostly need to stream a large number of these objects through the CPU but only need to perform very simple computations on the members variables. Could having v dynamically allocated could mean that an instance of A and its member v are not located together in memory? What tools and techniques can be used to test if this fragmentation is a performance bottleneck? If such fragmentation is a performance issue, are there any techniques that could allow A and v to allocated in a continuous region of memory? Or are there any techniques to aid memory access such as pre-fetching scheme? for example get an object of type A operate on the other member variables whilst pre-fetching v. If the size of v or an acceptable maximum size could be known at compile time would replacing v with a fixed sized array like int v[max_length] lead to better performance? The target platforms are standard desktop machines with x86/AMD64 processors, Windows or Linux OSes and compiled using either GCC or MSVC compilers.

    Read the article

  • How can you make a PHP application require a key to work?

    - by jasondavis
    About 4 years ago I used a php product called amember pro, it is a membership script which has plugins for lie 30 different payment processors, it was an easy way to set up an automated membership site where users would pay a payment and get access to a certain area. The script used ioncube http://www.ioncube.com/sa_encoder.php to prevent non-paying users from using the script, it requered that you register the domain that the script would be used on, you were then given a key to enter into the file that would make the system/script work. Now I am wanting to know how to do such a task, I know ioncube encoder just makes it hard to see the code, in the script I mention, they would just have a small section at the tp of 1 of the included pages that was encrypted and without that part of the code it would break and in addition if the owner of the script did not put you domain in the list and give you a valid key it would not work, also if you tried to use the script on a different domain it would not work. I realize that somewhere in the encrypted code that is must of sent you key to there server and checked that it was valid for the domain name it is on, or possibly it did not even do that, maybe the key would just verify that it matched the domain the script was on, that more likely what it did. Here is where the real question is, How would you make a script require the portion that is encrypted? If I made a script and had a small encrypted part at the top, it would seem a user would be able to easily just remove the encrypted part and figure out what the non encrypted part is doing and fix it to work. Any ideas?

    Read the article

  • Overriding content_type for Rails Paperclip plugin

    - by Fotios
    I think I have a bit of a chicken and egg problem. I would like to set the content_type of a file uploaded via Paperclip. The problem is that the default content_type is only based on extension, but I'd like to base it on another module. I seem to be able to set the content_type with the before_post_process class Upload < ActiveRecord::Base has_attached_file :upload before_post_process :foo def foo logger.debug "Changing content_type" #This works self.upload.instance_write(:content_type,"foobar") # This fails because the file does not actually exist yet self.upload.instance_write(:content_type,file_type(self.upload.path) end # Returns the filetype based on file command (assume it works) def file_type(path) return `file -ib '#{path}'`.split(/;/)[0] end end But...I cannot base the content type on the file because Paperclip doesn't write the file until after_create. And I cannot seem to set the content_type after it has been saved or with the after_create callback (even back in the controller) So I would like to know if I can somehow get access to the actual file object (assume there are no processors doing anything to the original file) before it is saved, so that I can run the file_type command on that. Or is there a way to modify the content_type after the objects have been created.

    Read the article

  • How do I optimize this postfix expression tree for speed?

    - by Peter Stewart
    Thanks to the help I received in this post: I have a nice, concise recursive function to traverse a tree in postfix order: deque <char*> d; void Node::postfix() { if (left != __nullptr) { left->postfix(); } if (right != __nullptr) { right->postfix(); } d.push_front(cargo); return; }; This is an expression tree. The branch nodes are operators randomly selected from an array, and the leaf nodes are values or the variable 'x', also randomly selected from an array. char *values[10]={"1.0","2.0","3.0","4.0","5.0","6.0","7.0","8.0","9.0","x"}; char *ops[4]={"+","-","*","/"}; As this will be called billions of times during a run of the genetic algorithm of which it is a part, I'd like to optimize it for speed. I have a number of questions on this topic which I will ask in separate postings. The first is: how can I get access to each 'cargo' as it is found. That is: instead of pushing 'cargo' onto a deque, and then processing the deque to get the value, I'd like to start processing it right away. I don't yet know about parallel processing in c++, but this would ideally be done concurrently on two different processors. In python, I'd make the function a generator and access succeeding 'cargo's using .next(). But I'm using c++ to speed up the python implementation. I'm thinking that this kind of tree has been around for a long time, and somebody has probably optimized it already. Any Ideas? Thanks

    Read the article

  • Instanced drawing with OpenGL ES 2.0

    - by Mårten Wikström
    In short: Is it possible to use the gl_InstanceID built-in variable in OpenGL ES 2.0? And, if so, how? Some more info: I want to draw multiple instances of an object using glDrawArraysInstanced and gl_InstanceID, and I want my application to run on multiple platforms, including iOS. The specification clearly says that these features require ES 3.0. According to the iOS Device Compatibility Reference ES 3.0 is only available on a few devices (those based on the A7 GPU; so iPhone 5s, but not on iPhone 5 or earlier). So my first assumption was that I needed to avoid using instanced drawing on older iOS devices. However, further down in the compatibility reference document it says that the EXT_draw_instanced extension is supported for all SGX Series 5 processors (that includes iPhone 5 and 4s). This makes me think that I could indeed use instanced drawing on older iOS devices too, by looking up and using the appropriate extension function (EXT or ARB) for glDrawArraysInstanced. I'm currently just running some test code using SDL and GLEW on Windows so I haven't tested anything on iOS yet. However, in my current setup I'm having trouble using the gl_InstanceID built-in variable in a vertex shader. I'm getting the following error message: 'gl_InstanceID' : variable is not available in current GLSL version Enabling the "draw_instanced" extension in GLSL has no effect: #extension GL_ARB_draw_instanced : enable #extension GL_EXT_draw_instanced : enable The error goes away when I specifically declare that I need ES 3.0 (GLSL 300 ES): #version 300 es Although that seem to work fine on my Windows desktop machine in an ES 2.0 context I doubt that this would work on an iPhone 5. So, shall I abandon the idea of being able to use instanced drawing on older iOS devices?

    Read the article

  • Micro Second resolution timestamps on windows.

    - by Nikhil
    How to get micro second resolution timestamps on windows? I am loking for something better than QueryPerformanceCounter, QueryPerformanceFrequency (these can only give you an elapsed time since boot, and are not necessarily accurate if they are called on different threads - ie QueryPerformanceCounter may return different results on different CPUs. There are also some processors that adjust their frequency for power saving, which apparently isn't always reflected in their QueryPerformanceFrequency result.) There is this, http://msdn.microsoft.com/en-us/magazine/cc163996.aspx but it does not seem to be solid. This looks great but its not available for download any more. http://www.ibm.com/developerworks/library/i-seconds/ This is another resource. http://www.lochan.org/2005/keith-cl/useful/win32time.html But requires a number of steps, running a helper program plus some init stuff also, I am not sure if it works on multiple CPUs Also looked at the Wikipedia link on the subject which is interesting but not that useful. http://en.wikipedia.org/wiki/Time_Stamp_Counter If the answer is just do this with BSD or Linux, its a lot easier thats fine, but I would like to confirm this and get some explanation as to why this is so hard in windows and so easy in linux and bsd. Its the same damm hardware...

    Read the article

  • Fastest inline-assembly spinlock

    - by sigvardsen
    I'm writing a multithreaded application in c++, where performance is critical. I need to use a lot of locking while copying small structures between threads, for this I have chosen to use spinlocks. I have done some research and speed testing on this and I found that most implementations are roughly equally fast: Microsofts CRITICAL_SECTION, with SpinCount set to 1000, scores about 140 time units Implementing this algorithm with Microsofts InterlockedCompareExchange scores about 95 time units Ive also tried to use some inline assembly with __asm {} using something like this code and it scores about 70 time units, but I am not sure that a proper memory barrier has been created. Edit: The times given here are the time it takes for 2 threads to lock and unlock the spinlock 1,000,000 times. I know this isn't a lot of difference but as a spinlock is a heavily used object, one would think that programmers would have agreed on the fastest possible way to make a spinlock. Googling it leads to many different approaches however. I would think this aforementioned method would be the fastest if implemented using inline assembly and using the instruction CMPXCHG8B instead of comparing 32bit registers. Furthermore memory barriers must be taken into account, this could be done by LOCK CMPXHG8B (I think?), which guarantees "exclusive rights" to the shared memory between cores. At last [some suggests] that for busy waits should be accompanied by NOP:REP that would enable Hyper-threading processors to switch to another thread, but I am not sure whether this is true or not? From my performance-test of different spinlocks, it is seen that there is not much difference, but for purely academic purpose I would like to know which one is fastest. However as I have extremely limited experience in the assembly-language and with memory barriers, I would be happy if someone could write the assembly code for the last example I provided with LOCK CMPXCHG8B and proper memory barriers in the following template: __asm { spin_lock: ;locking code. spin_unlock: ;unlocking code. }

    Read the article

  • Testing approach for multi-threaded software

    - by Shane MacLaughlin
    I have a piece of mature geospatial software that has recently had areas rewritten to take better advantage of the multiple processors available in modern PCs. Specifically, display, GUI, spatial searching, and main processing have all been hived off to seperate threads. The software has a pretty sizeable GUI automation suite for functional regression, and another smaller one for performance regression. While all automated tests are passing, I'm not convinced that they provide nearly enough coverage in terms of finding bugs relating race conditions, deadlocks, and other nasties associated with multi-threading. What techniques would you use to see if such bugs exist? What techniques would you advocate for rooting them out, assuming there are some in there to root out? What I'm doing so far is running the GUI functional automation on the app running under a debugger, such that I can break out of deadlocks and catch crashes, and plan to make a bounds checker build and repeat the tests against that version. I've also carried out a static analysis of the source via PC-Lint with the hope of locating potential dead locks, but not had any worthwhile results. The application is C++, MFC, mulitple document/view, with a number of threads per doc. The locking mechanism I'm using is based on an object that includes a pointer to a CMutex, which is locked in the ctor and freed in the dtor. I use local variables of this object to lock various bits of code as required, and my mutex has a time out that fires my a warning if the timeout is reached. I avoid locking where possible, using resource copies where possible instead. What other tests would you carry out?

    Read the article

  • Is there a IDE/compiler PC benchmark I can use to compare my PCs performance?

    - by RickL
    I'm looking for a benchmark (and results on other PCs) which would give me an idea of the development performance gain I could get by upgrading my PC, also the benchmark could be used to justify the upgrade to my boss. I use Visual Studio 2008 for my development, so I'd like to get an idea of by what factor the build times would be improved, and also it would be good if the benchmark could incorporate IDE performance (i.e. when editing, using intellisense, opening code files etc) into its result. I currently have an AMD 3800x2, with 2GB RAM on Vista 32. For example, I'd like to know what kind of performance gain I'd see in Visual Studio 2008 with a Q6600, 4GB RAM on Vista 64. And also with other processors, and other RAM sizes... also see whether hard disk performance is a big factor. EDIT: I mentioned Vista 64 because I'm aware that Vista 32 can only use 3GB RAM maximum. So I'd presume that wanting to use more RAM would require Vista 64, but perhaps it could still be slower overall there is a large overhead in using the 32 bit VS 2008 on 64 bit OS.

    Read the article

  • OpenMP: Get total number of running threads

    - by Konrad Rudolph
    I need to know the total number of threads that my application has spawned via OpenMP. Unfortunately, the omp_get_num_threads() function does not work here since it only yields the number of threads in the current team. However, my code runs recursively (divide and conquer, basically) and I want to spawn new threads as long as there are still idle processors, but no more. Is there a way to get around the limitations of omp_get_num_threads and get the total number of running threads? If more detail is required, consider the following pseudo-code that models my workflow quite closely: function divide_and_conquer(Job job, int total_num_threads): if job.is_leaf(): # Recurrence base case. job.process() return left, right = job.divide() current_num_threads = omp_get_num_threads() if current_num_threads < total_num_threads: # (1) #pragma omp parallel num_threads(2) #pragma omp section divide_and_conquer(left, total_num_threads) #pragma omp section divide_and_conquer(right, total_num_threads) else: divide_and_conquer(left, total_num_threads) divide_and_conquer(right, total_num_threads) job = merge(left, right) If I call this code with a total_num_threads value of 4, the conditional annotated with (1) will always evaluate to true (because each thread team will contain at most two threads) and thus the code will always spawn two new threads, no matter how many threads are already running at a higher level. I am searching for a platform-independent way of determining the total number of threads that are currently running in my application.

    Read the article

  • Link failure with either abnormal memory consumption or LNK1106 in Visual Studio 2005.

    - by Corvin
    Hello, I am trying to build a solution for windows XP in Visual Studio 2005. This solution contains 81 projects (static libs, exe's, dlls) and is being successfully used by our partners. I copied the solution bundle from their repository and tried setting it up on 3 similar machines of people in our group. I was successful on two machines and the solution failed to build on my machine. The build on my machine encountered two problems: During a simple build creation of the biggest static library (about 522Mb in debug mode) would fail with the message "13libd\ui1d.lib : fatal error LNK1106: invalid file or disk full: cannot seek to 0x20101879" Full solution rebuild creates this library, however when it comes to linking the library to main .exe file, devenv.exe spawns link.exe which consumes about 80Mb of physical memory and 250MB of virtual and spawns another link.exe, which does the same. This goes on until the system runs out of memory. On PCs of my colleagues where successful build could be performed, there is only one link.exe process which uses all the memory required for linking (about 500Mb physical). There is a plenty of hard drive space on my machine and the file system is NTFS. All three of our systems are similar - Core2Quad processors, 4Gb of RAM, Windows XP SP3. We are using Visual studio installed from the same source. I tried using a different RAM and CPU, using dedicated graphics adapter to eliminate possibility of video memory sharing influencing the build, putting solution files to different location, using different versions of VS 2005 (Professional, Standard and Team Suite), changing the amount of available virtual memory, running memtest86 and building the project from scratch (i.e. a clean bundle). I have read what MSDN says about LNK1106, none of the cases apply to me except for maybe "out of heap space", however I am not sure how I should fight this. The only idea that I have left is reinstalling the OS, however I am not sure that it would help and I am not sure that my situation wouldn't repeat itself on a different machine. Would anyone have any sort of advice for me? Thanks

    Read the article

  • Multiple plugin instance loading with MEF

    - by Dave
    In my last application, using MEF to load plugins went just fine, but now I'm running into a new issue. I have a solution for it that I explain at the end of this question, but I'm looking for other ways to do it. Let's say I have an interface called ApplianceInterface. I also have two plugins that inherit from ApplianceInterface, let's call them Blender and Processor. Now, I would like to have multiple Blenders and Processors in my application, but I am not sure how to instantiate them properly. Before, I would simply use the ImportMany attribute and upon calling ComposeParts, my application would load Blender and Processor. For example: [ImportMany(typeof(ApplianceInterface))] private IEnumerable<ApplianceInterface> Appliances { get; set; } and my Blender and Processor plugins would be attributed like this: [PartCreationPolicy(CreationPolicy.NonShared)] [Export(typeof(MyInterface)] public class Blender : ApplianceInterface { ... } but what this ends up doing for me is populating Appliances with one Blender and one Processor. I need to be able to create an arbitrary number of Blender and Processor objects. Now, from the documentation I understand that [PartCreationPolicy(CreationPolicy.NonShared)] is what allows MEF to create a new instance each time, but is there a similar "magical" way to create a specific number of instances of something using MEF? Up until now, I've relied on [Import] and [ImportMany] to resolve the assemblies. Is my only option to use a global container, and then resolve the export manually using GetExportedValue<? I have tried GetExportedValue< and that implementation does work fine for me, but I was just curious if there is a better, more accepted way to do it.

    Read the article

< Previous Page | 21 22 23 24 25 26 27 28 29 30  | Next Page >