While the author mentions this is mostly applicable to things like FPGAs, there'...

boulos · on May 11, 2024

That blog post is now a decade old, but includes an important quote:

> The IEEE standard does guarantee some things. It guarantees more than the floating-point-math-is-mystical crowd realizes, but less than some programmers might think.

Summarizing the blog post, it highlights a few things though some less clearly than I would like:

  * x87 was wonky

  * You need to ensure rounding modes, flush-to-zero, etc. are consistently set

  * Some older processors don't have FMA

  * Approximate instructions (mmsqrtps et al.) don't have a consistent spec

  * Compilers may reassociate expressions

For small routines and self-written libraries, it's straightforward, if painful to ensure you avoid all of that.

Briefly mentioned in the blog post is IEEE-754 (2008) made the spec more explicit, and effectively assumed the death of x87. It's 2024 now, so you can definitely avoid x87. Similarly, FMA is part of the IEEE-754 2008 spec, and has been built into all modern processors since (Haswell and later on Intel).

There are still cross-architecture differences like 8-wide AVX2 vs 4-wide NEON that can trip you up, but if you are writing assembly or intrinsics or just C that inspect with Compiler Explorer or objdump, you can look at the output and say "Yep, that'll be consistent".

Someone · on May 11, 2024

> but if you are writing assembly or intrinsics or just C that inspect with Compiler Explorer or objdump, you can look at the output and say "Yep, that'll be consistent".

Surely people have written tooling for those checks for various CPUs?

Also, is it that ‘simple’? Reading https://github.com/llvm/llvm-project/issues/62479, calculations that the compiler does and that only end up in the executable as constants can make results different between architectures or compilers (possibly even compiler runs, if it runs multi-threaded and constant folding order depends on timing, but I find it hard to imagine how exactly that would happen).

So, you’d want to check constants in the code, too, but then, there’s no guarantee that compilers do the same constant folding. You can try to get more consistent by being really diligent in using constexpr, but that doesn’t guarantee that, either.

mjcohen · on May 11, 2024

Years ago, I was programming in Ada and ran across a case where the value of a constant in a program differed from the same constant being converted at runtime. Took a while to track that one down.

boulos · on May 11, 2024

The same reasoning applies though. The compiler is just another program. Outside of doing constant folding on things that are unspec'ed or not required (like mmsqrtps and most transcendentals), you should get consistent results even between architectures.

Of course the specific line linked to in that GH issue is showing that LLVM will attempt constant folding of various trig functions:

https://github.com/llvm/llvm-project/blob/faa43a80f70c639f40...

but the IEEE-754 spec does also recommend correctly rounded results for those (https://en.wikipedia.org/wiki/IEEE_754#Recommended_operation...).

The majority of code I'm talking about though uses constants that are some long, explicit number, and doesn't do any math on them that would then be amenable to constant folding itself.

That said, lines like:

https://github.com/llvm/llvm-project/blob/faa43a80f70c639f40...

are more worrying, since that may differ from what people expect dynamically (though the underlying stuff supports different denormal rules).

Either way, thanks for highlighting this! Clearly the answer is to just use LLVM/clang regardless of backend :).

nextaccountic · on May 13, 2024

> There are still cross-architecture differences like 8-wide AVX2 vs 4-wide NEON that can trip you up

Does those differences produce different results when running cross-platform simd code?

boulos · on May 14, 2024

Depends on what you're doing. The main issue here is reductions / accumulations.

That is, if you have a bunch of floats like:

  float sum = 0.f;
  for (int i = 0; i < N; i++) {
    sum += x[i];
  }

and that you vectorize that to something like (typing in this comment, errors are likely):

  __mm256 sum_8wide = _mm256_setzero_ps();
  for (int i = 0; i < N/8; i++) {
    sum_8wide = _mm256_add_ps(sum_8wide, _mm256_load_ps(x[8*i]);
  }
  // Now sum up the 8 values to get the final sum
  float sum = _mm256_hadd_ps(...);

that will result in a different accumulation than if you did those are 4-wide and then a reduction. The usual solution is to either use the lowest common denominator (e.g., use AVX instead of AVX2) or more performance oriented, use the 4-wide SIMD units on ARM to "emulate" an 8-wide virtual vector (~15 years since I wrote NEON... and again, this is in a comment):

  float32x4_t sum_lo = ; // zero;
  float32x4_t sum_hi = ; // zero;
  for (int i = 0; i < N/8; i++) {
    sum_lo = vaddq_f32(sum_lo, vload1q_f32(x[8*i));
    sum_hi = vaddq_f32(sum_hi, vload1q_f32(x[8*i + 4]));
  }
  // Reduce the sum in the same order

You would want to write a "virtual SIMD" wrapper library, so you don't do this manually in lots of places.

teleforce · on May 11, 2024

The author did mention about fixed point was very popular for gamedev before floating point becoming popular due the increased in hardware capability, and most likely CORDIC was being used as well together with fixed point.

> In fact, before IEEE 754 became the popular standard that it is today, fixed point was used all the time (go and ask any gamedev who worked on stuff between 1980 and 2000ish and they'll tell you all about it).

apitman · on May 12, 2024

I believe that was mostly for performance reasons, not determinism, right?

aappleby · on May 11, 2024

Was gamedev between 1980 and 2000ish, can confirm. PS1 had no floating point unit.

TuringTourist · on May 12, 2024

This was the cause of the signature jiggly textures that were pervasive in PS1 games

fayalalebrun · on May 12, 2024

This is a common misconception, but is not the case. For example, look at the Voodoo 1, 2, and 3, which also used fixed point numbers internally but did not suffer from this problem.

The real issue is that the PS1 has no subpixel precision. In other words, it will round a triangle coordinates to the nearest integers.

Likely the reason why they did this is because then you can completely avoid any division and multiplication hardware, with integer start and end coordinates line rasterization can be done completely with addition and comparisons.

Sharlin · on May 12, 2024

Didn’t PS1 also lack perspective corrected texture mapping? That would definitely make textures wobbly. AFAIK they compensated for it simply by using as finely subdivided geometry as possible (which wasn’t very finely, really).

sgtnoodle · on May 12, 2024

The folk that made Crash Bandicoot were pretty clever. They figured out that the PlayStation could render untextured, shaded triangles a lot faster than textured triangles, so they "textured" the main character with pixel-scale geometry. This in turn saved them enough memory to use a higher resolution frame buffer mode.

https://all-things-andy-gavin.com/2011/02/04/making-crash-ba...

djmips · on May 12, 2024

It wasn't Pixel scale, more paint by numbers.

nextaccountic · on May 13, 2024

The nphysics physics simulation library for gamedev used this approach of using fixed point math whenever cross-platform determinism was wanted, with CORDIC. Nphysics however was deprecated.

The newer Rapier library (which is a rewrite of the nphysics) instead relies on the guarantees of IEEE-754 2008 to provide cross-platform determinism, which means that it doesn't work with old platforms, but it is deterministic across modern platforms, including wasm. And yes, you can't rely on the transcedental routines provided by each platform (like sine, cosine, etc), those need to be implemented in a way to work the same everywhere. But, this is possible if avoid running on non-compliant platforms.

https://www.rustsim.org/blog/2020/06/01/this-month-in-rustsi...

https://rapier.rs/docs/user_guides/rust/determinism/