I implemented a RNS Montgomery exponentiation in Cuda and on cpu for comparison.
Everything nice everything fine. It runs on just one SM.
However I am going to tell you some strange regression in both cpu/gpu performances.
During the devoloping, about two month ago, I was using Cuda 5 preview on Ubuntu 11.04 64b.
In this time, I reach the following performances:
cpu 460ms gpu 120ms
Then one day when I turn on the pc, the graphical environment didnt start. I dont know which was the problem, however I switched to the console and installed again the Cuda driver. At the following boot performances changed:
cpu 310ms gpu 80ms
I was like Q.Q...uhm ok, nice to see this, but I was wondering how that could be possible
However, I went then in holiday for 10 days and I continued developing and optimizing on my notebook (but not the same part of the code, some additional stuff)
When I was back, I just updated the source files, and performances came back to 460/120ms..
I couldnt believe it, I tried to install Cuda 5 RC, updating the video driver too... nothing changed...
I checked Debug/Release, Cuda computability, but the problem seems being somewhere else..
Looking around the net I found this, I am pretty sure it must have something to do with the driver, because the performance change affected both cpu and gpu
Do you have some tips/ideas/suggestions?