Speeding up procedural texture generation
- by FalconNL
Recently I've begun working on a game that takes place in a procedurally generated solar system. After a bit of a learning curve (having neither worked with Scala, OpenGL 2 ES or Libgdx before), I have a basic tech demo going where you spin around a single procedurally textured planet:
The problem I'm running into is the performance of the texture generation. A quick overview of what I'm doing: a planet is a cube that has been deformed to a sphere. To each side, a n x n (e.g. 256 x 256) texture is applied, which are bundled in one 8n x n texture that is sent to the fragment shader. The last two spaces are not used, they're only there to make sure the width is a power of 2. The texture is currently generated on the CPU, using the updated 2012 version of the simplex noise algorithm linked to in the paper 'Simplex noise demystified'. The scene I'm using to test the algorithm contains two spheres: the planet and the background. Both use a greyscale texture consisting of six octaves of 3D simplex noise, so for example if we choose 128x128 as the texture size there are 128 x 128 x 6 x 2 x 6 = about 1.2 million calls to the noise function.
The closest you will get to the planet is about what's shown in the screenshot and since the game's target resolution is 1280x720 that means I'd prefer to use 512x512 textures. Combine that with the fact the actual textures will of course be more complicated than basic noise (There will be a day and night texture, blended in the fragment shader based on sunlight, and a specular mask. I need noise for continents, terrain color variation, clouds, city lights, etc.) and we're looking at something like 512 x 512 x 6 x 3 x 15 = 70 million noise calls for the planet alone. In the final game, there will be activities when traveling between planets, so a wait of 5 or 10 seconds, possibly 20, would be acceptable since I can calculate the texture in the background while traveling, though obviously the faster the better.
Getting back to our test scene, performance on my PC isn't too terrible, though still too slow considering the final result is going to be about 60 times worse:
128x128 : 0.1s
256x256 : 0.4s
512x512 : 1.7s
This is after I moved all performance-critical code to Java, since trying to do so in Scala was a lot worse. Running this on my phone (a Samsung Galaxy S3), however, produces a more problematic result:
128x128 : 2s
256x256 : 7s
512x512 : 29s
Already far too long, and that's not even factoring in the fact that it'll be minutes instead of seconds in the final version. Clearly something needs to be done. Personally, I see a few potential avenues, though I'm not particularly keen on any of them yet:
Don't precalculate the textures, but let the fragment shader calculate everything. Probably not feasible, because at one point I had the background as a fullscreen quad with a pixel shader and I got about 1 fps on my phone.
Use the GPU to render the texture once, store it and use the stored texture from then on. Upside: might be faster than doing it on the CPU since the GPU is supposed to be faster at floating point calculations. Downside: effects that cannot (easily) be expressed as functions of simplex noise (e.g. gas planet vortices, moon craters, etc.) are a lot more difficult to code in GLSL than in Scala/Java.
Calculate a large amount of noise textures and ship them with the application. I'd like to avoid this if at all possible.
Lower the resolution. Buys me a 4x performance gain, which isn't really enough plus I lose a lot of quality.
Find a faster noise algorithm. If anyone has one I'm all ears, but simplex is already supposed to be faster than perlin.
Adopt a pixel art style, allowing for lower resolution textures and fewer noise octaves. While I originally envisioned the game in this style, I've come to prefer the realistic approach.
I'm doing something wrong and the performance should already be one or two orders of magnitude better. If this is the case, please let me know.
If anyone has any suggestions, tips, workarounds, or other comments regarding this problem I'd love to hear them.