4 Comments
User's avatar
Patrolin's avatar

Awesome post. Why do some of the FileRead() calls have 3 arguments and some have 4 arguments? Also you're computing LaneRange(values_count) twice in the last example for seemingly no reason?

Expand full comment
Tejas's avatar

Great post, thank you for sharing. Do you intend on writing all or most of your future programs like this?

Also, I kinda wonder what would happen if hypothetically, every program running on a computer were written to allocate NUM_CORES_ON_MACHINE threads upfront and use this multicore by default approach. I would imagine that in 99% of normal home user scenarios where you really only care about one program (the one you are currently interacting with), it would be very beneficial. But maybe if for some reason you were running lots of individual processes at once and cared about how long it takes for all of them to complete at once, there might not be much different in the overall time taken...

Expand full comment
TJ Kotha's avatar

In the context of a broader game engine design, how might this "CPU shader" approach be designed?

I think this may be one of those areas that as you mention at the end of the article may not fit 100% (and is in fact requiring heterogenous systems), but I'll try to flesh out the idea in a thought exercise.

In an ideal world, it seems like each thread would simply run their copy of the game loop. Thread 0 would probably be responsible for keeping time and gathering inputs (and maybe a barrier sync could be used if that's required), and then afterwards each thread pulls out relevant update tasks from its portion of the update queue (and propagates any tasks it'd like done at the end of its update section).

I think one problem with this approach tho is that components like OpenGL (or SDL iirc) lock themselves to the main thread, and require response from that thread to do anything. So you cant arbitrarily assign any thread to interact with those systems. That can be argued as an issue of engineering cruft.

A more fundamental issue may lie in things like the input and timekeeping code. For game logic, it seems like it'd be best to have a referee thread that acts as the source of truth. If multiple threads independently gather input or do timekeeping, that'd become messy very quickly.

Let me know your thoughts

Expand full comment
Ryan Fleury's avatar

When external systems (like those exposed by the operating system, or GPU APIs) enforce their own timeline or thread assignment constraints, then they need to be pulled out into their own heterogeneous timelines / thread groups. In the debugger we have this with the Windows event loop, the GPU APIs, and also the debug APIs - you need to dedicate a thread to wait for debug events which is not locked to the refresh rate - it must go much faster - and on Linux, only one thread can actually be attached to other threads as a debugger.

In my game project, I separate UI/rendering onto its own timeline, and keep game simulation on its own timeline. I also pull input event gathering into a separate timeline. I wrote all of that before I learned the stuff in this post, so these are just individual threads rather than thread groups / wavefronts / whatever you want to call them. But, when I keep going on that project, what I will probably do is just convert the UI/render ("user") thread into a "user" thread *group*, and same thing with the game simulation thread. Basically all that requires is going and annotating the code appropriately, and redesigning certain algorithms when needed.

When it comes to game simulation, I don't think there is a problem if you just deterministically decide on the timestamps in absolute space, given the tick index. If you do that, then all threads should just get the right answer. But if you ever did need a single source of truth, then you can always simply do that by going narrow, and then syncing the value. No problem.

Expand full comment