This subject, as with any optimisation problem, gets hit on a lot, but I just couldn't find what I (think) I want.
A lot of tutorials, and even SO questions have similar tips; generally covering:
Use GL face culling (the OpenGL function, not the scene logic)
Only send 1 matrix to the GPU (projectionModelView combination), therefore decreasing the MVP calculations from per vertex to once per model (as it should be).
Use interleaved Vertices
Minimize as many GL calls as possible, batch where appropriate
And possibly a few/many others. I am (for curiosity reasons) rendering 28 million triangles in my application using several vertex buffers. I have tried all the above techniques (to the best of my knowledge), and received almost no performance change.
Whilst I am receiving around 40FPS in my implementation, which is by no means problematic, I am still curious as to where these optimisation 'tips' actually come into use?
My CPU is idling around 20-50% during rendering, therefore I assume I am GPU bound for increasing performance.
Note: I am looking into gDEBugger at the moment
Cross posted at Game Development