How Do You Profile & Optimize CUDA Kernels?

Posted by John Dibling on Stack Overflow See other posts from Stack Overflow or by John Dibling
Published on 2010-02-05T01:40:16Z Indexed on 2010/04/29 5:07 UTC
Read the original article Hit count: 311

Filed under:
|
|

I am somewhat familiar with the CUDA visual profiler and the occupancy spreadsheet, although I am probably not leveraging them as well as I could. Profiling & optimizing CUDA code is not like profiling & optimizing code that runs on a CPU. So I am hoping to learn from your experiences about how to get the most out of my code.

There was a post recently looking for the fastest possible code to identify self numbers, and I provided a CUDA implementation. I'm not satisfied that this code is as fast as it can be, but I'm at a loss as to figure out both what the right questions are and what tool I can get the answers from.

How do you identify ways to make your CUDA kernels perform faster?

© Stack Overflow or respective owner

Related posts about cuda

Related posts about optimization