Hi!
I've written a server app (two parts actually; proxy server and a game server) using C++ (board game). It uses IOCP as the sockets interface.
For that app I've also written a "client simulator" (hereafter "client") app that spawns many client connections, where each of them plays, in very high speed, getting the CPU to be 100% utilized.
So, that's how it goes in terms of topology:
Game server - holds the game state. Real players do not connect it directly but through the proxy server. When a player joins a game, the proxy actually asks for it on behalf of that player, and the game server spawns a "player instance" for that player, and from now on, every notification between the game server and the player is being passed through the proxy.
Proxy server - holds TCP connections with the real players. Players communicate with the game server through it only.
Client simulator - connects to the proxy only.
When running the server (again, it's actually two server apps) & client locally it all works just fine. I'm talking about 40k+ player instances in which all of them are active in a game.
On the other hand, when running the server remotely with, say, 1000 clients who play things getting strange.
For example, I run it as said above. Then with Task Manager I kill the client simulator app ("End Process Tree").
Then it seems like the buffer of the remote server got modified by another thread, or in other words, a memory corruption has been occurred.
The server crashes because it got an unknown message id (it's a custom protocol where each message has it's own unique number).
To make things clear, here is how I run the apps:
PC1 - game server and clients simulator (because the clients will connect the proxy).
PC2 - proxy server.
The strangest thing is this:
Only the remote side gets "corrupted". Remote in terms that it's not the PC I use to code the app (VC++ 2008).
Let's call the PC I use to code the apps "PC1".
Now for example, if this time I ran the game server on PC1 (it means that proxy server on PC2 and clients simulator on PC1), then the proxy server crashes with an "unknown message id" error.
Another variation is when I run the proxy server on PC1 (again, the dev machine), the game server and the clients simulator on PC2, then the game server on PC2 gets crashed.
As for the IOCP config:
The servers' internal connections use the default receive/send buffer sizes. Tried even with setting them to 1MB, but no luck.
I have three PCs in total;
2 x Vista 64bit <<-- one of those is the dev machine. The other is connected through WiFi.
1 x WinXP 32bit
They're all connected in a "full duplex" manner.
What could be the reason? Tried about everything; Stack tracing, recording some actions (like read/write logging)..
I want to stress that only the PC I'm not using to code the apps crashes (actually the server app "role" which is running on it - sometimes the game server and sometimes the proxy server).
At first I thought that maybe the wireless PC has problems (it's wireless..) but: TCP has it's own mechanisms to make sure the packet is delivered properly. Also, a crash also happens when trying it with the two PCs that are physically connected (Vista vs. XP).
Another option is that the Windows DLLs versions might have problems, but then again, one of the tests is Vista vs. Vista, and the other is Vista vs. XP.
Any idea?