Search Results

Search found 7 results on 1 pages for 'etan'.

Page 1/1 | 1 

  • Getting the number of fragments which passed the depth test

    - by Etan
    In "modern" environments, the "NV Occlusion Query" extension provides a method to get the number of fragments which passed the depth test. However, on the iPad / iPhone using OpenGL ES, the extension is not available. What is the most performant approach to implement a similar behaviour in the fragment shader? Some of my ideas: Render the object completely in white, then count all the colors together using a two-pass shader where first a vertical line is rendered and for each fragment the shader computes the sum over the whole row. Then, a single vertex is rendered whose fragment sums all the partial sums of the first pass. Doesn't seem to be very efficient. Render the object completely in white over a black background. Downsample recursively, abusing the hardware linear interpolation between textures until being at a reasonably small resolution. This leads to fragments which have a greyscale level depending on the number of white pixels where in their corresponding region. Is this even accurate enough? Use mipmaps and simply read the pixel on the 1x1 level. Again the question of accuracy and if it is even possible using non-power-of-two textures. The problem wit these approaches is, that the pipeline gets stalled which results in major performance issues. Therefore, I'm looking for a more performant way to accomplish my goal. Using the EXT_OCCLUSION_QUERY_BOOLEAN extension Apple introduced EXT_OCCLUSION_QUERY_BOOLEAN in iOS 5.0 for iPad 2. "4.1.6 Occlusion Queries Occlusion queries use query objects to track the number of fragments or samples that pass the depth test. An occlusion query can be started and finished by calling BeginQueryEXT and EndQueryEXT, respectively, with a target of ANY_SAMPLES_PASSED_EXT or ANY_SAMPLES_PASSED_CONSERVATIVE_EXT. When an occlusion query is started with the target ANY_SAMPLES_PASSED_EXT, the samples-boolean state maintained by the GL is set to FALSE. While that occlusion query is active, the samples-boolean state is set to TRUE if any fragment or sample passes the depth test. When the occlusion query finishes, the samples-boolean state of FALSE or TRUE is written to the corresponding query object as the query result value, and the query result for that object is marked as available. If the target of the query is ANY_SAMPLES_PASSED_CONSERVATIVE_EXT, an implementation may choose to use a less precise version of the test which can additionally set the samples-boolean state to TRUE in some other implementation dependent cases." The first sentence hints on a behavior which is exactly what I'm looking for: getting the number of pixels which passed the depth test in an asynchronous manner without much performance loss. However, the rest of the document describes only how to get boolean results. Is it possible to exploit this extension to get the pixel count? Does the hardware support it so that there may be hidden API to get access to the pixel count? Other extensions which could be exploitable would be debugging features like the number of times the fragment shader was invoked (PSInvocations in DirectX - not sure if something simila is available in OpenGL ES). However, this would also result in a pipeline stall.

    Read the article

  • Calculating vertex normals on the GPU

    - by Etan
    I have some height-map sampled on a regular grid stored in an array. Now, I want to use the normals on the sampled vertices for some smoothing algorithm. The way I'm currently doing it is as follows: For each vertex, generate triangles to all it's neighbours. This results in eight neighbours when using the 1-neighbourhood for all vertices except at the borders. +---+---+ ¦ \ ¦ / ¦ +---o---+ ¦ / ¦ \ ¦ +---+---+ For each adjacent triangle, calculate it's normal by taking the cross product between the two distances. As the triangles all have the same size when projected on the xy-plane, I simply average over all eight normals then and store it for this vertex. However, as my data grows larger, this approach takes too much time and I would prefer doing it on the GPU in a shader code. Is there an easy method, maybe if I could store my height-map as a texture?

    Read the article

  • Alternative to NV Occlusion Query - getting the number of fragments which passed the depth test

    - by Etan
    In "modern" environments, the "NV Occlusion Query" extension provide a method to get the number of fragments which passed the depth test. However, on the iPad / iPhone using OpenGL ES, the extension is not available. What is the most performant approach to implement a similar behaviour in the fragment shader? Some of my ideas: Render the object completely in white, then count all the colors together using a two-pass shader where first a vertical line is rendered and for each fragment the shader computes the sum over the whole row. Then, a single vertex is rendered whose fragment sums all the partial sums of the first pass. Doesn't seem to be very efficient. Render the object completely in white over a black background. Downsample recursively, abusing the hardware linear interpolation between textures until being at a reasonably small resolution. This leads to fragments which have a greyscale level depending on the number of white pixels where in their corresponding region. Is this even accurate enough? ... ?

    Read the article

  • Hooking DirectX EndScene from an injected DLL

    - by Etan
    I want to detour EndScene from an arbitrary DirectX 9 application to create a small overlay. As an example, you could take the frame counter overlay of FRAPS, which is shown in games when activated. I know the following methods to do this: Creating a new d3d9.dll, which is then copied to the games path. Since the current folder is searched first, before going to system32 etc., my modified DLL gets loaded, executing my additional code. Downside: You have to put it there before you start the game. Same as the first method, but replacing the DLL in system32 directly. Downside: You cannot add game specific code. You cannot exclude applications where you don't want your DLL to be loaded. Getting the EndScene offset directly from the DLL using tools like IDA Pro 4.9 Free. Since the DLL gets loaded as is, you can just add this offset to the DLL starting address, when it is mapped to the game, to get the actual offset, and then hook it. Downside: The offset is not the same on every system. Hooking Direct3DCreate9 to get the D3D9, then hooking D3D9-CreateDevice to get the device pointer, and then hooking Device-EndScene through the virtual table. Downside: The DLL cannot be injected, when the process is already running. You have to start the process with the CREATE_SUSPENDED flag to hook the initial Direct3DCreate9. Creating a new Device in a new window, as soon as the DLL gets injected. Then, getting the EndScene offset from this device and hooking it, resulting in a hook for the device which is used by the game. Downside: as of some information I have read, creating a second device may interfere with the existing device, and it may bug with windowed vs. fullscreen mode etc. Same as the third method. However, you'll do a pattern scan to get EndScene. Downside: doesn't look that reliable. How can I hook EndScene from an injected DLL, which may be loaded when the game is already running, without having to deal with different d3d9.dll's on other systems, and with a method which is reliable? How does FRAPS for example perform it's DirectX hooks? The DLL should not apply to all games, just to specific processes where I inject it via CreateRemoteThread.

    Read the article

  • Optimizing a shared buffer in a producer/consumer multithreaded environment

    - by Etan
    I have some project where I have a single producer thread which writes events into a buffer, and an additional single consumer thread which takes events from the buffer. My goal is to optimize this thing for a single machine to achieve maximum throughput. Currently, I am using some simple lock-free ring buffer (lock-free is possible since I have only one consumer and one producer thread and therefore the pointers are only updated by a single thread). #define BUF_SIZE 32768 struct buf_t { volatile int writepos; volatile void * buffer[BUF_SIZE]; volatile int readpos;) }; void produce (buf_t *b, void * e) { int next = (b->writepos+1) % BUF_SIZE; while (b->readpos == next); // queue is full. wait b->buffer[b->writepos] = e; b->writepos = next; } void * consume (buf_t *b) { while (b->readpos == b->writepos); // nothing to consume. wait int next = (b->readpos+1) % BUF_SIZE; void * res = b->buffer[b->readpos]; b->readpos = next; return res; } buf_t *alloc () { buf_t *b = (buf_t *)malloc(sizeof(buf_t)); b->writepos = 0; b->readpos = 0; return b; } However, this implementation is not yet fast enough and should be optimized further. I've tried with different BUF_SIZE values and got some speed-up. Additionaly, I've moved writepos before the buffer and readpos after the buffer to ensure that both variables are on different cache lines which resulted also in some speed. What I need is a speedup of about 400 %. Do you have any ideas how I could achieve this using things like padding etc?

    Read the article

  • Constraints when using WCF for an online multiplayer game

    - by Etan
    I want to build a service oriented game server and client using WCF where users can play card games on different tables after they logged in with an account. I would like to choose WCF due to it's flexibility in exchanging the communication channels. Maybe, a web interface will be added later which can then just use an other channel class. An additional plus is the ability for contexts which could be used to track a user over a whole gaming session. Are there some constraints I should be aware of when using WCF for the communication between the client and the server?

    Read the article

  • Ensuring that all callbacks were completed before sending a new request through a DuplexChannel usin

    - by Etan
    I am experiencing some issues when using a Callback in a WCF project. First, the server invokes some function Foo on the client which then forwards the request to a Windows Forms GUI: GUI CLASS delegate void DoForward(); public void ForwardToGui() { if (this.cmdSomeButton.InvokeRequired) { DoForward d = new DoForward(ForwardToGui); this.Invoke(d); } else { Process(); // sets result variable in callback class as soon as done } } } CALLBACK CLASS object _m = new object(); private int _result; public int result { get { return _result; } set { _result = value; lock(_m) { Monitor.PulseAll(_m); } } } [OperationContract] public int Foo() { result = 0; Program.Gui.ForwardToGui(); lock(_m) { Monitor.Wait(_m, 30000); } return result; } The problem now is that the user should be able to cancel the process, which doesn't work properly: SERVER INTERFACE [OperationContract] void Cleanup(); GUI CLASS private void Gui_FormClosed(object sender, EventArgs e) { Program.callbackclass.nextAction = -1; // so that the monitor pulses and Foo() returns Program.server.Cleanup(); } The problem with this is that Cleanup() hangs. However, when I close the form when Process() is not running, it works properly. The source seems to be that the Cleanup() is called before the monitor pulses etc and therefore a new request is sent to the server before the last request from the server has not yet been responded. How can I solve this problem? How can I ensure before calling Cleanup() that no Foo() is currently being executed?

    Read the article

1