In "modern" environments, the "NV Occlusion Query" extension provides a method to get the number of fragments which passed the depth test. However, on the iPad / iPhone using OpenGL ES, the extension is not available.
What is the most performant approach to implement a similar behaviour in the fragment shader?
Some of my ideas:
Render the object completely in white, then count all the colors together using a two-pass shader where first a vertical line is rendered and for each fragment the shader computes the sum over the whole row. Then, a single vertex is rendered whose fragment sums all the partial sums of the first pass. Doesn't seem to be very efficient.
Render the object completely in white over a black background. Downsample recursively, abusing the hardware linear interpolation between textures until being at a reasonably small resolution. This leads to fragments which have a greyscale level depending on the number of white pixels where in their corresponding region. Is this even accurate enough?
Use mipmaps and simply read the pixel on the 1x1 level. Again the question of accuracy and if it is even possible using non-power-of-two textures.
The problem wit these approaches is, that the pipeline gets stalled which results in major performance issues. Therefore, I'm looking for a more performant way to accomplish my goal.
Using the EXT_OCCLUSION_QUERY_BOOLEAN extension
Apple introduced EXT_OCCLUSION_QUERY_BOOLEAN in iOS 5.0 for iPad 2.
"4.1.6 Occlusion Queries
Occlusion queries use query objects to track the number of fragments or
samples that pass the depth test. An occlusion query can be started and
finished by calling BeginQueryEXT and EndQueryEXT, respectively, with a
target of ANY_SAMPLES_PASSED_EXT or ANY_SAMPLES_PASSED_CONSERVATIVE_EXT.
When an occlusion query is started with the target
ANY_SAMPLES_PASSED_EXT, the samples-boolean state maintained by the GL is
set to FALSE. While that occlusion query is active, the samples-boolean
state is set to TRUE if any fragment or sample passes the depth test. When
the occlusion query finishes, the samples-boolean state of FALSE or TRUE is
written to the corresponding query object as the query result value, and
the query result for that object is marked as available. If the target of
the query is ANY_SAMPLES_PASSED_CONSERVATIVE_EXT, an implementation may
choose to use a less precise version of the test which can additionally set
the samples-boolean state to TRUE in some other implementation dependent
cases."
The first sentence hints on a behavior which is exactly what I'm looking for: getting the number of pixels which passed the depth test in an asynchronous manner without much performance loss. However, the rest of the document describes only how to get boolean results.
Is it possible to exploit this extension to get the pixel count? Does the hardware support it so that there may be hidden API to get access to the pixel count?
Other extensions which could be exploitable would be debugging features like the number of times the fragment shader was invoked (PSInvocations in DirectX - not sure if something simila is available in OpenGL ES). However, this would also result in a pipeline stall.