While the author mentions this is mostly applicable to things like FPGAs, there's also an application in gamedev (or any distributed physics simulation). Floating point calculations are tricky to get deterministic across platforms[0]. One solution is to skip floats entirely and implement a fixed-point physics engine. You'd need something like CORDIC to implement all the trig functionality.
I started working on such a thing as a fun side project a few years ago but never finished it. One of these days I hope to get back to it.
That blog post is now a decade old, but includes an important quote:
> The IEEE standard does guarantee some things. It guarantees more than the floating-point-math-is-mystical crowd realizes, but less than some programmers might think.
Summarizing the blog post, it highlights a few things though some less clearly than I would like:
* x87 was wonky
* You need to ensure rounding modes, flush-to-zero, etc. are consistently set
* Some older processors don't have FMA
* Approximate instructions (mmsqrtps et al.) don't have a consistent spec
* Compilers may reassociate expressions
For small routines and self-written libraries, it's straightforward, if painful to ensure you avoid all of that.
Briefly mentioned in the blog post is IEEE-754 (2008) made the spec more explicit, and effectively assumed the death of x87. It's 2024 now, so you can definitely avoid x87. Similarly, FMA is part of the IEEE-754 2008 spec, and has been built into all modern processors since (Haswell and later on Intel).
There are still cross-architecture differences like 8-wide AVX2 vs 4-wide NEON that can trip you up, but if you are writing assembly or intrinsics or just C that inspect with Compiler Explorer or objdump, you can look at the output and say "Yep, that'll be consistent".
> but if you are writing assembly or intrinsics or just C that inspect with Compiler Explorer or objdump, you can look at the output and say "Yep, that'll be consistent".
Surely people have written tooling for those checks for various CPUs?
Also, is it that ‘simple’? Reading https://github.com/llvm/llvm-project/issues/62479, calculations that the compiler does and that only end up in the executable as constants can make results different between architectures or compilers (possibly even compiler runs, if it runs multi-threaded and constant folding order depends on timing, but I find it hard to imagine how exactly that would happen).
So, you’d want to check constants in the code, too, but then, there’s no guarantee that compilers do the same constant folding. You can try to get more consistent by being really diligent in using constexpr, but that doesn’t guarantee that, either.
Years ago, I was programming in Ada and ran across a case where the value of a constant in a program differed from the same constant being converted at runtime. Took a while to track that one down.
The same reasoning applies though. The compiler is just another program. Outside of doing constant folding on things that are unspec'ed or not required (like mmsqrtps and most transcendentals), you should get consistent results even between architectures.
Of course the specific line linked to in that GH issue is showing that LLVM will attempt constant folding of various trig functions:
The majority of code I'm talking about though uses constants that are some long, explicit number, and doesn't do any math on them that would then be amenable to constant folding itself.
Depends on what you're doing. The main issue here is reductions / accumulations.
That is, if you have a bunch of floats like:
float sum = 0.f;
for (int i = 0; i < N; i++) {
sum += x[i];
}
and that you vectorize that to something like (typing in this comment, errors are likely):
__mm256 sum_8wide = _mm256_setzero_ps();
for (int i = 0; i < N/8; i++) {
sum_8wide = _mm256_add_ps(sum_8wide, _mm256_load_ps(x[8*i]);
}
// Now sum up the 8 values to get the final sum
float sum = _mm256_hadd_ps(...);
that will result in a different accumulation than if you did those are 4-wide and then a reduction. The usual solution is to either use the lowest common denominator (e.g., use AVX instead of AVX2) or more performance oriented, use the 4-wide SIMD units on ARM to "emulate" an 8-wide virtual vector (~15 years since I wrote NEON... and again, this is in a comment):
float32x4_t sum_lo = ; // zero;
float32x4_t sum_hi = ; // zero;
for (int i = 0; i < N/8; i++) {
sum_lo = vaddq_f32(sum_lo, vload1q_f32(x[8*i));
sum_hi = vaddq_f32(sum_hi, vload1q_f32(x[8*i + 4]));
}
// Reduce the sum in the same order
You would want to write a "virtual SIMD" wrapper library, so you don't do this manually in lots of places.
The author did mention about fixed point was very popular for gamedev before floating point becoming popular due the increased in hardware capability, and most likely CORDIC was being used as well together with fixed point.
> In fact, before IEEE 754 became the popular standard that it is today, fixed point was used all the time (go and ask any gamedev who worked on stuff between 1980 and 2000ish and they'll tell you all about it).
This is a common misconception, but is not the case. For example, look at the Voodoo 1, 2, and 3, which also used fixed point numbers internally but did not suffer from this problem.
The real issue is that the PS1 has no subpixel precision. In other words, it will round a triangle coordinates to the nearest integers.
Likely the reason why they did this is because then you can completely avoid any division and multiplication hardware, with integer start and end coordinates line rasterization can be done completely with addition and comparisons.
Didn’t PS1 also lack perspective corrected texture mapping? That would definitely make textures wobbly. AFAIK they compensated for it simply by using as finely subdivided geometry as possible (which wasn’t very finely, really).
The folk that made Crash Bandicoot were pretty clever. They figured out that the PlayStation could render untextured, shaded triangles a lot faster than textured triangles, so they "textured" the main character with pixel-scale geometry. This in turn saved them enough memory to use a higher resolution frame buffer mode.
The nphysics physics simulation library for gamedev used this approach of using fixed point math whenever cross-platform determinism was wanted, with CORDIC. Nphysics however was deprecated.
The newer Rapier library (which is a rewrite of the nphysics) instead relies on the guarantees of IEEE-754 2008 to provide cross-platform determinism, which means that it doesn't work with old platforms, but it is deterministic across modern platforms, including wasm. And yes, you can't rely on the transcedental routines provided by each platform (like sine, cosine, etc), those need to be implemented in a way to work the same everywhere. But, this is possible if avoid running on non-compliant platforms.
I started working on such a thing as a fun side project a few years ago but never finished it. One of these days I hope to get back to it.
[0]: https://randomascii.wordpress.com/2013/07/16/floating-point-...