One of the stated reasons for knowing assembler is that, on occasion, it can be employed to write code that will be more performant than writing that code in a higher-level language, C in particular. However, I've also heard it stated many times that although that's not entirely false, the cases where assembler can actually be used to generate more performant code are both extremely rare and require expert knowledge of and experience with assembler.
This question doesn't even get into the fact that assembler instructions will be machine-specific and non-portable, or any of the other aspects of assembler. There are plenty of good reasons for knowing assembler besides this one, of course, but this is meant to be a specific question soliciting examples and data, not an extended discourse on assembler versus higher-level languages.
Can anyone provide some specific examples of cases where assembler will be faster than well-written C code using a modern compiler, and can you support that claim with profiling evidence? I am pretty confident these cases exist, but I really want to know exactly how esoteric these cases are, since it seems to be a point of some contention.