processors - Page 25 - Developer IT

Django DRY Feeds

- by Mandx

I'm using the Django Feeds Framework and it's really nice, very intuitive and easy to use. But, I think there is a problem when creating links to feeds in HTML. For example: <link rel="alternate" type="application/rss+xml" title="{{ feed_title }}" href="{{ url_of_feed }}" /> Link's HREF attribute can be easily found out, just use reverse() But, what about the TITLE attribute? Where the template engine should look for this? Even more, what if the feed is build up dinamically and the title depends on parameters (like this)? I can't come up with a solution that "seems" DRY to me... All that I can come up with is using context processors o template tags, but it gets messy when the context procesor/template tag has to find parameters to construct the Feed class, and writing this I realize I don't even know how to create a Feed instance myself within the view. If I put all this logic in the view, it would not be just one view. Also, the value for TITLE would be in the view AND in the feed.

Read the article

Compiler optimization causing the performance to slow down

- by aJ

I have one strange problem. I have following piece of code: template<clss index, class policy> inline int CBase<index,policy>::func(const A& test_in, int* srcPtr ,int* dstPtr) { int width = test_in.width(); int height = test_in.height(); double d = 0.0; //here is the problem for(int y = 0; y < height; y++) { //Pointer initializations //multiplication involving y //ex: int z = someBigNumber*y + someOtherBigNumber; for(int x = 0; x < width; x++) { //multiplication involving x //ex: int z = someBigNumber*x + someOtherBigNumber; if(soemCondition) { // floating point calculations } *dstPtr++ = array[*srcPtr++]; } } } The inner loop gets executed nearly 200,000 times and the entire function takes 100 ms for completion. ( profiled using AQTimer) I found an unused variable double d = 0.0; outside the outer loop and removed the same. After this change, suddenly the method is taking 500ms for the same number of executions. ( 5 times slower). This behavior is reproducible in different machines with different processor types. (Core2, dualcore processors). I am using VC6 compiler with optimization level O2. Follwing are the other compiler options used : -MD -O2 -Z7 -GR -GX -G5 -X -GF -EHa I suspected compiler optimizations and removed the compiler optimization /O2. After that function became normal and it is taking 100ms as old code. Could anyone throw some light on this strange behavior? Why compiler optimization should slow down performance when I remove unused variable ? Note: The assembly code (before and after the change) looked same.

Read the article

What do I need for development for an ARM processor?

- by claws

Hello, I'm familiar with X86[-64] architecture & assembly. I want to start develop for an ARM processor. But unlike desktop processors, I don't have an actual ARM processor. I think I need an ARM simulator. http://www.armtutorial.com/ say An ARM assembly compiler will be required, the most accessible is the ARMulator. I thought of downloading Armulator but found from http://forums.arm.com/index.php?showtopic=13744 that Its not sold seperately. But you can download an eval of RVDS - which includes RVISS/ARMulator I've downloaded & installed RVDS but It looks very complex. I'm unable to figure out what do I need to do to write ARM assembly & run it. I want to write in assembly not in C/C++. I don't have an ARM processor. What is a good simulator? Can any one please mention in short. How to write assembly & assemble & simulate using RVDS. Please be clear? Are there any other alternative ways. I can't afford buying any kind of boards. I always learn from books rather than tutorials. I'm following these two books: ARM System Developer's Guide: Designing and Optimizing System Software (The Morgan Kaufmann Series in Computer Architecture and Design) ARM System-on-Chip Architecture (2nd Edition) Do you have any better suggestions?

Read the article

How to compile a C DLL for 64 bit with Visual Studio 2010?

- by Daren Thomas

I have a DLL written in C in source code. This is the code for the General Polygon Clipper (in case you are interested). I'm using it in a C# project via the C# wrapper provided on the homepage. This comes with a precompiled DLL. Since switching to a 64bit Development machine with Visual Studio 2010 and Windows 7 64 bit, the application won't run anymore. This is the error I get: An attempt was made to load a program with an incorrect format. This is because of DLLImporting the 32bit gpc.dll, as I have gathered from stuff found on the web. I assume this will all go away if I recompile the DLL to 64bit, but can't for the love of me figure out how to do so. My C skills are basic, in that I can write a C program with the GNU tools, but have no experience with various compilers / processors / IDEs etc. I believe I could port this to C#. By that I mean I trust myself to actually pull it off. But I'd prefer not to, since it is a lot of work that I'd prefer a compiler to do for me ;)

Read the article

Bare-metal virtualisation for the desktop

- by Andrew Taylor

Hi, Does anyone have any knowledge about bare-metal virtualisation products? I'm interested in building a new desktop machine for home, I've been looking at the Intel Quad Core processors and I'd like to put 8GB of RAM in there, but, it got me thinking about making the most out of the available resources. I thought if I could get a good 64bit machine, put some bare-metal virtualisation on, then have a primary system, I'd also be able to bring up some extra virtualised systems as and when I needed. I know most of the bare metal systems are designed for the server market, but, is there anything out there that works well for a desktop. What are the caveats? I presume I won't be able to make the most out of any video cards I could buy, what about just getting a decent screen resolution, will this be a problem? I run a single 24" screen. What about DVD/CD writing, is this possible? I'd like to re-rip my CD collection, I was hoping the quad 64Bit goodness would help me out with the encoding. I currently use a Mac and couldn't go back to windows so that leaves Linux, I was thinking a primary OS of ubuntu. Does this make a difference? Thanks Andrew

Read the article

Why is my multithreaded Java program not maxing out all my cores on my machine?

- by James B

Hi, I have a program that starts up and creates an in-memory data model and then creates a (command-line-specified) number of threads to run several string checking algorithms against an input set and that data model. The work is divided amongst the threads along the input set of strings, and then each thread iterates the same in-memory data model instance (which is never updated again, so there are no synchronization issues). I'm running this on a Windows 2003 64-bit server with 2 quadcore processors, and from looking at Windows task Manager they aren't being maxed-out, (nor are they looking like they are being particularly taxed) when I run with 10 threads. Is this normal behaviour? It appears that 7 threads all complete a similar amount of work in a similar amount of time, so would you recommend running with 7 threads instead? Should I run it with more threads?...Although I assume this could be detrimental as the JVM will do more context switching between the threads. Alternatively, should I run it with fewer threads? Alternatively, what would be the best tool I could use to measure this?...Would a profiling tool help me out here - indeed, is one of the several profilers better at detecting bottlenecks (assuming I have one here) than the rest? Note, the server is also running SQL Server 2005 (this may or may not be relevant), but nothing much is happening on that database when I am running my program. Note also, the threads are only doing string matching, they aren't doing any I/O or database work or anything else they may need to wait on. Thanks in advance, -James

Read the article

Subscription Based Billing

- by regex

Hello All, I'm putting together a small start up company which will be set up with a subscription based billing model. The bill will go to customers on either a monthly or quarterly basis depending on the end user's preference. My question is two parted: I'm new to online billing and I'm only really aware of Pay Pal when it comes to third party bill payment, but this seems more like a check out system. I'm sure there are better alternatives than PayPal for third party billing processors (I have tried Googling for them, but I'm having trouble finding exactly what I'm looking for). What options (companies) are available for third party payment processing and what types of experiences (good or bad) have you had with them? We would like to give our customers the ability to set up recurring payments. I'd rather not store a customer's credit card number on our database as I imagine there are a plethora of compliance guidelines around this. Is there a third party solution for recurring payment processing? On a side note, this is not necessarily a code related question and is more business focused. I wasn't sure if there was a better route for posting this question, and please commont or modify this if there is another route I should take. Thanks!!

Read the article

ExecutorService - scaling

- by Stanciu Alexandru-Marian

I am trying to write a program in Java using ExecutorService and it's function invokeAll. My question is: does the invokeAll functions solve the tasks simultaneously? I mean, if i have two processors, there will be two workers in the same time? Because a can't make it to scale correct. It takes the same time to complete the problem if i give newFixedThreadPool(2) or 1. List<Future<PartialSolution>> list = new ArrayList<Future<PartialSolution>>(); Collection<Callable<PartialSolution>> tasks = new ArrayList<Callable<PartialSolution>>(); for(PartialSolution ps : wp) { tasks.add(new Map(ps, keyWords)); } list = executor.invokeAll(tasks); Map is a class that implements Callable and wp is a vector of Partial Solutions, a class that holds some informations in different times. Why doesn't it scale? What could be the problem? Thank you, Alex

Read the article

How to invalidate cache when benchmarking?

- by Michael Buen

I have this code, that when swapping the order of UsingAs and UsingCast, their performance also swaps. using System; using System.Diagnostics; using System.Linq; using System.IO; class Test { const int Size = 30000000; static void Main() { object[] values = new MemoryStream[Size]; UsingAs(values); UsingCast(values); Console.ReadLine(); } static void UsingCast(object[] values) { Stopwatch sw = Stopwatch.StartNew(); int sum = 0; foreach (object o in values) { if (o is MemoryStream) { var m = (MemoryStream)o; sum += (int)m.Length; } } sw.Stop(); Console.WriteLine("Cast: {0} : {1}", sum, (long)sw.ElapsedMilliseconds); } static void UsingAs(object[] values) { Stopwatch sw = Stopwatch.StartNew(); int sum = 0; foreach (object o in values) { if (o is MemoryStream) { var m = o as MemoryStream; sum += (int)m.Length; } } sw.Stop(); Console.WriteLine("As: {0} : {1}", sum, (long)sw.ElapsedMilliseconds); } } Outputs: As: 0 : 322 Cast: 0 : 281 When doing this... UsingCast(values); UsingAs(values); ...Results to this: Cast: 0 : 322 As: 0 : 281 When doing just this... UsingAs(values); ...Results to this: As: 0 : 322 When doing just this: UsingCast(values); ...Results to this: Cast: 0 : 322 Aside from running them independently, how to invalidate the cache so the second code being benchmarked won't receive the cached memory of first code? Benchmarking aside, just loved the fact that modern processors do this caching magic :-)

Read the article

Fork in Perl not working inside a while loop reading from file

- by Sag

Hi, I'm running a while loop reading each line in a file, and then fork processes with the data of the line to a child. After N lines I want to wait for the child processes to end and continue with the next N lines, etc. It looks something like this: while ($w=<INP>) { # ignore file header if ($w=~m/^\D/) { next;} # get data from line chomp $w; @ws = split(/\s/,$w); $m = int($ws[0]); $d = int($ws[1]); $h = int($ws[2]); # only for some days in the year if (($m==3)and($d==15) or ($m==4)and($d==21) or ($m==7)and($d==18)) { die "could not fork" unless defined (my $pid = fork); unless ($pid) { some instructions here using $m, $d, $h ... } push @qpid,$pid; # when all processors are busy, wait for child processes if ($#qpid==($procs-1)) { for my $pid (@qpid) { waitpid $pid, 0; } reset 'q'; } } } close INP; This is not working. After the first round of processes I get some PID equal to 0, the @qpid array gets mixed up, and the file starts to get read at (apparently) random places, jumping back and forth. The end result is that most lines in the file get read two or three times. Any ideas? Thanks a lot in advance, S.

Read the article

[UNIX] Sort lines of massive file by number of words on line (ideally in parallel)

- by conradlee

I am working on a community detection algorithm for analyzing social network data from Facebook. The first task, detecting all cliques in the graph, can be done efficiently in parallel, and leaves me with an output like this: 17118 17136 17392 17064 17093 17376 17118 17136 17356 17318 12345 17118 17136 17356 17283 17007 17059 17116 Each of these lines represents a unique clique (a collection of node ids), and I want to sort these lines in descending order by the number of ids per line. In the case of the example above, here's what the output should look like: 17118 17136 17356 17318 12345 17118 17136 17356 17283 17118 17136 17392 17064 17093 17376 17007 17059 17116 (Ties---i.e., lines with the same number of ids---can be sorted arbitrarily.) What is the most efficient way of sorting these lines. Keep the following points in mind: The file I want to sort could be larger than the physical memory of the machine Most of the machines that I'm running this on have several processors, so a parallel solution would be ideal An ideal solution would just be a shell script (probably using sort), but I'm open to simple solutions in python or perl (or any language, as long as it makes the task simple) This task is in some sense very easy---I'm not just looking for any old solution, but rather for a simple and above all efficient solution

Read the article

Password Cracking in 2010 and Beyond

- by mttr

I have looked a bit into cryptography and related matters during the last couple of days and am pretty confused by now. I have a question about password strength and am hoping that someone can clear up my confusion by sharing how they think through the following questions. I am becoming obsessed about these things, but need to spend my time otherwise :-) Let's assume we have an eight-digit password that consists of upper and lower-case alphabetic characters, numbers and common symbols. This means we have 8^96 ~= 7.2 quadrillion different possible passwords. As I understand there are at least two approaches to breaking this password. One is to try a brute-force attack where we try to guess each possible combination of characters. How many passwords can modern processors (in 2010, Core i7 Extreme for eg) guess per second (how many instructions does a single password guess take and why)? My guess would be that it takes a modern processor in the order of years to break such a password. Another approach would consist of obtaining a hash of my password as stored by operating systems and then search for collisions. Depending on the type of hash used, we might get the password a lot quicker than by the bruteforce attack. A number of questions about this: Is the assertion in the above sentence correct? How do I think about the time it takes to find collisions for MD4, MD5, etc. hashes? Where does my Snow Leopard store my password hash and what hashing algorithm does it use? And finally, regardless of the strength of file encryption using AES-128/256, the weak link is still my en/decryption password used. Even if breaking the ciphered text would take longer than the lifetime of the universe, a brute-force attack on my de/encryption password (guess password, then try to decrypt file, try next password...), might succeed a lot earlier than the end of the universe. Is that correct? I would be very grateful, if people could have mercy on me and help me think through these probably simple questions, so that I can get back to work.

Read the article

Access cost of dynamically created objects with dynamically allocated members

- by user343547

I'm building an application which will have dynamic allocated objects of type A each with a dynamically allocated member (v) similar to the below class class A { int a; int b; int* v; }; where: The memory for v will be allocated in the constructor. v will be allocated once when an object of type A is created and will never need to be resized. The size of v will vary across all instances of A. The application will potentially have a huge number of such objects and mostly need to stream a large number of these objects through the CPU but only need to perform very simple computations on the members variables. Could having v dynamically allocated could mean that an instance of A and its member v are not located together in memory? What tools and techniques can be used to test if this fragmentation is a performance bottleneck? If such fragmentation is a performance issue, are there any techniques that could allow A and v to allocated in a continuous region of memory? Or are there any techniques to aid memory access such as pre-fetching scheme? for example get an object of type A operate on the other member variables whilst pre-fetching v. If the size of v or an acceptable maximum size could be known at compile time would replacing v with a fixed sized array like int v[max_length] lead to better performance? The target platforms are standard desktop machines with x86/AMD64 processors, Windows or Linux OSes and compiled using either GCC or MSVC compilers.

Read the article

What does flushing thread local memory to global memory mean?

- by Jack Griffith

Hi, I am aware that the purpose of volatile variables in Java is that writes to such variables are immediately visible to other threads. I am also aware that one of the effects of a synchronized block is to flush thread-local memory to global memory. I have never fully understood the references to 'thread-local' memory in this context. I understand that data which only exists on the stack is thread-local, but when talking about objects on the heap my understanding becomes hazy. I was hoping that to get comments on the following points: When executing on a machine with multiple processors, does flushing thread-local memory simply refer to the flushing of the CPU cache into RAM? When executing on a uniprocessor machine, does this mean anything at all? If it is possible for the heap to have the same variable at two different memory locations (each accessed by a different thread), under what circumstances would this arise? What implications does this have to garbage collection? How aggressively do VMs do this kind of thing? Overall, I think am trying to understand whether thread-local means memory that is physically accessible by only one CPU or if there is logical thread-local heap partitioning done by the VM? Any links to presentations or documentation would be immensely helpful. I have spent time researching this, and although I have found lots of nice literature, I haven't been able to satisfy my curiosity regarding the different situations & definitions of thread-local memory. Thanks very much.

Read the article

Instanced drawing with OpenGL ES 2.0

- by Mårten Wikström

In short: Is it possible to use the gl_InstanceID built-in variable in OpenGL ES 2.0? And, if so, how? Some more info: I want to draw multiple instances of an object using glDrawArraysInstanced and gl_InstanceID, and I want my application to run on multiple platforms, including iOS. The specification clearly says that these features require ES 3.0. According to the iOS Device Compatibility Reference ES 3.0 is only available on a few devices (those based on the A7 GPU; so iPhone 5s, but not on iPhone 5 or earlier). So my first assumption was that I needed to avoid using instanced drawing on older iOS devices. However, further down in the compatibility reference document it says that the EXT_draw_instanced extension is supported for all SGX Series 5 processors (that includes iPhone 5 and 4s). This makes me think that I could indeed use instanced drawing on older iOS devices too, by looking up and using the appropriate extension function (EXT or ARB) for glDrawArraysInstanced. I'm currently just running some test code using SDL and GLEW on Windows so I haven't tested anything on iOS yet. However, in my current setup I'm having trouble using the gl_InstanceID built-in variable in a vertex shader. I'm getting the following error message: 'gl_InstanceID' : variable is not available in current GLSL version Enabling the "draw_instanced" extension in GLSL has no effect: #extension GL_ARB_draw_instanced : enable #extension GL_EXT_draw_instanced : enable The error goes away when I specifically declare that I need ES 3.0 (GLSL 300 ES): #version 300 es Although that seem to work fine on my Windows desktop machine in an ES 2.0 context I doubt that this would work on an iPhone 5. So, shall I abandon the idea of being able to use instanced drawing on older iOS devices?

Read the article

Fastest inline-assembly spinlock

- by sigvardsen

I'm writing a multithreaded application in c++, where performance is critical. I need to use a lot of locking while copying small structures between threads, for this I have chosen to use spinlocks. I have done some research and speed testing on this and I found that most implementations are roughly equally fast: Microsofts CRITICAL_SECTION, with SpinCount set to 1000, scores about 140 time units Implementing this algorithm with Microsofts InterlockedCompareExchange scores about 95 time units Ive also tried to use some inline assembly with __asm {} using something like this code and it scores about 70 time units, but I am not sure that a proper memory barrier has been created. Edit: The times given here are the time it takes for 2 threads to lock and unlock the spinlock 1,000,000 times. I know this isn't a lot of difference but as a spinlock is a heavily used object, one would think that programmers would have agreed on the fastest possible way to make a spinlock. Googling it leads to many different approaches however. I would think this aforementioned method would be the fastest if implemented using inline assembly and using the instruction CMPXCHG8B instead of comparing 32bit registers. Furthermore memory barriers must be taken into account, this could be done by LOCK CMPXHG8B (I think?), which guarantees "exclusive rights" to the shared memory between cores. At last [some suggests] that for busy waits should be accompanied by NOP:REP that would enable Hyper-threading processors to switch to another thread, but I am not sure whether this is true or not? From my performance-test of different spinlocks, it is seen that there is not much difference, but for purely academic purpose I would like to know which one is fastest. However as I have extremely limited experience in the assembly-language and with memory barriers, I would be happy if someone could write the assembly code for the last example I provided with LOCK CMPXCHG8B and proper memory barriers in the following template: __asm { spin_lock: ;locking code. spin_unlock: ;unlocking code. }

Read the article

Overriding content_type for Rails Paperclip plugin

- by Fotios

I think I have a bit of a chicken and egg problem. I would like to set the content_type of a file uploaded via Paperclip. The problem is that the default content_type is only based on extension, but I'd like to base it on another module. I seem to be able to set the content_type with the before_post_process class Upload < ActiveRecord::Base has_attached_file :upload before_post_process :foo def foo logger.debug "Changing content_type" #This works self.upload.instance_write(:content_type,"foobar") # This fails because the file does not actually exist yet self.upload.instance_write(:content_type,file_type(self.upload.path) end # Returns the filetype based on file command (assume it works) def file_type(path) return `file -ib '#{path}'`.split(/;/)[0] end end But...I cannot base the content type on the file because Paperclip doesn't write the file until after_create. And I cannot seem to set the content_type after it has been saved or with the after_create callback (even back in the controller) So I would like to know if I can somehow get access to the actual file object (assume there are no processors doing anything to the original file) before it is saved, so that I can run the file_type command on that. Or is there a way to modify the content_type after the objects have been created.

Read the article

Micro Second resolution timestamps on windows.

- by Nikhil

How to get micro second resolution timestamps on windows? I am loking for something better than QueryPerformanceCounter, QueryPerformanceFrequency (these can only give you an elapsed time since boot, and are not necessarily accurate if they are called on different threads - ie QueryPerformanceCounter may return different results on different CPUs. There are also some processors that adjust their frequency for power saving, which apparently isn't always reflected in their QueryPerformanceFrequency result.) There is this, http://msdn.microsoft.com/en-us/magazine/cc163996.aspx but it does not seem to be solid. This looks great but its not available for download any more. http://www.ibm.com/developerworks/library/i-seconds/ This is another resource. http://www.lochan.org/2005/keith-cl/useful/win32time.html But requires a number of steps, running a helper program plus some init stuff also, I am not sure if it works on multiple CPUs Also looked at the Wikipedia link on the subject which is interesting but not that useful. http://en.wikipedia.org/wiki/Time_Stamp_Counter If the answer is just do this with BSD or Linux, its a lot easier thats fine, but I would like to confirm this and get some explanation as to why this is so hard in windows and so easy in linux and bsd. Its the same damm hardware...

Read the article

How can you make a PHP application require a key to work?

- by jasondavis

About 4 years ago I used a php product called amember pro, it is a membership script which has plugins for lie 30 different payment processors, it was an easy way to set up an automated membership site where users would pay a payment and get access to a certain area. The script used ioncube http://www.ioncube.com/sa_encoder.php to prevent non-paying users from using the script, it requered that you register the domain that the script would be used on, you were then given a key to enter into the file that would make the system/script work. Now I am wanting to know how to do such a task, I know ioncube encoder just makes it hard to see the code, in the script I mention, they would just have a small section at the tp of 1 of the included pages that was encrypted and without that part of the code it would break and in addition if the owner of the script did not put you domain in the list and give you a valid key it would not work, also if you tried to use the script on a different domain it would not work. I realize that somewhere in the encrypted code that is must of sent you key to there server and checked that it was valid for the domain name it is on, or possibly it did not even do that, maybe the key would just verify that it matched the domain the script was on, that more likely what it did. Here is where the real question is, How would you make a script require the portion that is encrypted? If I made a script and had a small encrypted part at the top, it would seem a user would be able to easily just remove the encrypted part and figure out what the non encrypted part is doing and fix it to work. Any ideas?

Read the article

Testing approach for multi-threaded software

- by Shane MacLaughlin

I have a piece of mature geospatial software that has recently had areas rewritten to take better advantage of the multiple processors available in modern PCs. Specifically, display, GUI, spatial searching, and main processing have all been hived off to seperate threads. The software has a pretty sizeable GUI automation suite for functional regression, and another smaller one for performance regression. While all automated tests are passing, I'm not convinced that they provide nearly enough coverage in terms of finding bugs relating race conditions, deadlocks, and other nasties associated with multi-threading. What techniques would you use to see if such bugs exist? What techniques would you advocate for rooting them out, assuming there are some in there to root out? What I'm doing so far is running the GUI functional automation on the app running under a debugger, such that I can break out of deadlocks and catch crashes, and plan to make a bounds checker build and repeat the tests against that version. I've also carried out a static analysis of the source via PC-Lint with the hope of locating potential dead locks, but not had any worthwhile results. The application is C++, MFC, mulitple document/view, with a number of threads per doc. The locking mechanism I'm using is based on an object that includes a pointer to a CMutex, which is locked in the ctor and freed in the dtor. I use local variables of this object to lock various bits of code as required, and my mutex has a time out that fires my a warning if the timeout is reached. I avoid locking where possible, using resource copies where possible instead. What other tests would you carry out?

Read the article

How do I optimize this postfix expression tree for speed?

- by Peter Stewart

Thanks to the help I received in this post: I have a nice, concise recursive function to traverse a tree in postfix order: deque <char*> d; void Node::postfix() { if (left != __nullptr) { left->postfix(); } if (right != __nullptr) { right->postfix(); } d.push_front(cargo); return; }; This is an expression tree. The branch nodes are operators randomly selected from an array, and the leaf nodes are values or the variable 'x', also randomly selected from an array. char *values[10]={"1.0","2.0","3.0","4.0","5.0","6.0","7.0","8.0","9.0","x"}; char *ops[4]={"+","-","*","/"}; As this will be called billions of times during a run of the genetic algorithm of which it is a part, I'd like to optimize it for speed. I have a number of questions on this topic which I will ask in separate postings. The first is: how can I get access to each 'cargo' as it is found. That is: instead of pushing 'cargo' onto a deque, and then processing the deque to get the value, I'd like to start processing it right away. I don't yet know about parallel processing in c++, but this would ideally be done concurrently on two different processors. In python, I'd make the function a generator and access succeeding 'cargo's using .next(). But I'm using c++ to speed up the python implementation. I'm thinking that this kind of tree has been around for a long time, and somebody has probably optimized it already. Any Ideas? Thanks

Read the article

OpenMP: Get total number of running threads

- by Konrad Rudolph

I need to know the total number of threads that my application has spawned via OpenMP. Unfortunately, the omp_get_num_threads() function does not work here since it only yields the number of threads in the current team. However, my code runs recursively (divide and conquer, basically) and I want to spawn new threads as long as there are still idle processors, but no more. Is there a way to get around the limitations of omp_get_num_threads and get the total number of running threads? If more detail is required, consider the following pseudo-code that models my workflow quite closely: function divide_and_conquer(Job job, int total_num_threads): if job.is_leaf(): # Recurrence base case. job.process() return left, right = job.divide() current_num_threads = omp_get_num_threads() if current_num_threads < total_num_threads: # (1) #pragma omp parallel num_threads(2) #pragma omp section divide_and_conquer(left, total_num_threads) #pragma omp section divide_and_conquer(right, total_num_threads) else: divide_and_conquer(left, total_num_threads) divide_and_conquer(right, total_num_threads) job = merge(left, right) If I call this code with a total_num_threads value of 4, the conditional annotated with (1) will always evaluate to true (because each thread team will contain at most two threads) and thus the code will always spawn two new threads, no matter how many threads are already running at a higher level. I am searching for a platform-independent way of determining the total number of threads that are currently running in my application.

Read the article

Is there a IDE/compiler PC benchmark I can use to compare my PCs performance?

- by RickL

I'm looking for a benchmark (and results on other PCs) which would give me an idea of the development performance gain I could get by upgrading my PC, also the benchmark could be used to justify the upgrade to my boss. I use Visual Studio 2008 for my development, so I'd like to get an idea of by what factor the build times would be improved, and also it would be good if the benchmark could incorporate IDE performance (i.e. when editing, using intellisense, opening code files etc) into its result. I currently have an AMD 3800x2, with 2GB RAM on Vista 32. For example, I'd like to know what kind of performance gain I'd see in Visual Studio 2008 with a Q6600, 4GB RAM on Vista 64. And also with other processors, and other RAM sizes... also see whether hard disk performance is a big factor. EDIT: I mentioned Vista 64 because I'm aware that Vista 32 can only use 3GB RAM maximum. So I'd presume that wanting to use more RAM would require Vista 64, but perhaps it could still be slower overall there is a large overhead in using the 32 bit VS 2008 on 64 bit OS.

Read the article

MPI Odd/Even Compare-Split Deadlock

- by erebel55

I'm trying to write an MPI version of a program that runs an odd/even compare-split operation on n randomly generated elements. Process 0 should generated the elements and send nlocal of them to the other processes, (keeping the first nlocal for itself). From here, process 0 should print out it's results after running the CompareSplit algorithm. Then, receive the results from the other processes run of the algorithm. Finally, print out the results that it has just received. I have a large chunk of this already done, but I'm getting a deadlock that I can't seem to fix. I would greatly appreciate any hints that people could give me. Here is my code http://pastie.org/3742474 Right now I'm pretty sure that the deadlock is coming from the Send/Recv at lines 134 and 151. I've tried changing the Send to use "tag" instead of myrank for the tag parameter..but when I did that I just keep getting a "MPI_ERR_TAG: invalid tag" for some reason. Obviously I would also run the algorithm within the processors 0 but I took that part out for now, until I figure out what is going wrong. Any help is appreciated.

Read the article

Parallel doseq for Clojure

- by andrew cooke

I haven't used multithreading in Clojure at all so am unsure where to start. I have a doseq whose body can run in parallel. What I'd like is for there always to be 3 threads running (leaving 1 core free) that evaluate the body in parallel until the range is exhausted. There's no shared state, nothing complicated - the equivalent of Python's multiprocessing would be just fine. So something like: (dopar 3 [i (range 100)] ; repeated 100 times in 3 parallel threads... ...) Where should I start looking? Is there a command for this? A standard package? A good reference? So far I have found pmap, and could use that (how do I restrict to 3 at a time? looks like it uses 32 at a time - no, source says 2 + number of processors), but it seems like this is a basic primitive that should already exist somewhere. clarification: I really would like to control the number of threads. I have processes that are long-running and use a fair amount of memory, so creating a large number and hoping things work out OK isn't a good approach (example which uses a significant chunk available mem). update: Starting to write a macro that does this, and I need a semaphore (or a mutex, or an atom i can wait on). Do semaphores exist in Clojure? Or should I use a ThreadPoolExecutor? It seems odd to have to pull so much in from Java - I thought parallel programming in Clojure was supposed to be easy... Maybe I am thinking about this completely the wrong way? Hmmm. Agents?

Search Results

Search found 740 results on 30 pages for 'processors'.

Page 25/30 | < Previous Page | 21 22 23 24 25 26 27 28 29 30 | Next Page >

- by Mandx

- by aJ

- by claws

- by Daren Thomas

- by Andrew Taylor

- by James B

- by regex

- by Stanciu Alexandru-Marian

- by Michael Buen

- by Sag

- by conradlee

- by mttr

- by user343547

- by Jack Griffith

- by Mårten Wikström

- by sigvardsen

- by Fotios

- by Nikhil

- by jasondavis

- by Shane MacLaughlin

- by Peter Stewart

- by Konrad Rudolph

- by RickL

- by erebel55

- by andrew cooke

< Previous Page | 21 22 23 24 25 26 27 28 29 30 | Next Page >