I'm making a software renderer which does per-polygon rasterization using a floating point digital differential analyzer algorithm. My idea was to create two threads for rasterization and have them work like so: one thread draws each even scanline in a polygon and the other thread draws each odd scanline, and they both start working at the same time, but the main application waits for both of them to finish and then pauses them before continuing with other computations.
As this is the first time I'm making a threaded application, I'm not sure if the following method for thread synchronization is correct:
First of all, I use two global variables to control the two threads, if a global variable is set to 1, that means the thread can start working, otherwise it must not work. This is checked by the thread running an infinite loop and if it detects that the global variable has changed its value, it does its job and then sets the variable back to 0 again. The main program also uses an empty while to check when both variables become 0 after setting them to 1.
Second, each thread is assigned a global structure which contains information about the triangle that is about to be rasterized. The structures are filled in by the main program before setting the global variables to 1.
My dilemma is that, while this process works under some conditions, it slows down the program considerably, and also it fails to run properly when compiled for Release in Visual Studio, or when compiled with any sort of -O optimization with gcc (i.e. nothing on screen, even SEGFAULTs). The program isn't much faster by default without threads, which you can see for yourself by commenting out the #define THREADS directive, but if I apply optimizations, it becomes much faster (especially with gcc -Ofast -march=native).
N.B. It might not compile with gcc because of fscanf_s calls, but you can replace those with the usual fscanf, if you wish to use gcc.
Because there is a lot of code, too much for here or pastebin, I created a git repository where you can view it.
My questions are:
Why does adding these two threads slow down my application?
Why doesn't it work when compiling for Release or with optimizations?
Can I speed up the application with threads?
If so, how?
Thanks in advance.