I know the title is a bit vague but it's hard to describe what I'm really looking for, but here goes.
When it comes to CPU rendering, performance is mostly easy to estimate and straightforward, but when it comes to the GPU due to my lack of technical background information, I'm clueless. I'm using XNA so it'd be nice if theory could be related to that.
So what I actually wanna know is, what happens when and where (CPU/GPU) when you do specific draw actions? What is a batch? What influence do effects, projections etc have? Is data persisted on the graphics card or is it transferred over every step? When there's talk about bandwidth, are you talking about a graphics card internal bandwidth, or the pipeline from CPU to GPU?
Note: I'm not actually looking for information on how the drawing process happens, that's the GPU's business, I'm interested on all the overhead that precedes that.
I'd like to understand what's going on when I do action X, to adapt my architectures and practices to that.
Any articles (possibly with code examples), information, links, tutorials that give more insight in how to write better games are very much appreciated. Thanks :)