Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think I get why C++ thru C are all similar (all compile to similar assembly?), but I don't get why Go thru maybe Racket are all in what looks like a pretty narrow clump. Is there a common element there?


The common element is that they're written with the most obvious version of the code, while the ones in the faster bucket are either explicitly vectorized or written in non-obvious ways to help the compiler auto-vectorize. For example, consider the Objective C version of the loop in leibniz.m:

  for (long i = 2; i <= rounds + 2; i++) {
      x *= -1.0;
      pi += x / (2.0 * i - 1.0);
  }
With my older version of Clang, the resulting assembly at -O3 isn't vectorized. Now look at the C version in leibniz.c:

  rounds += 2u; // do this outside the loop
  for (unsigned i=2u; i < rounds; ++i) // use ++i instead of i++
  {
      double x = -1.0 + 2.0 * (i & 0x1); // allows vectorization
      pi += (x / (2u * i - 1u)); // double / unsigned = double
  }
This produces vectorized code when I compile it. When I replace the Objective C loop with that code, the compiler also produces vectorized code.

You see something similar in the other kings-of-speed languages. Zig? It's the C code ported directly to a different syntax. D? Exact same. Fortran 90? Slightly different, but still obviously written with compiler vectorization in mind.

(For what it's worth, the trunk version of Clang is able to auto-vectorize either version of the loop without help.)


I think it's SIMD generation. Managed runtimes have a much harder time autovectorizing, because you can't do any static analysis about things like array sizes. Note that the true low-level tools are all clustered around 2-300ms, and that the next level up are all the "managed" runtimes around 1-2s.

The one exception is sort of an exception that proves the rule: it's marked "C# (SIMD)", and looks like a native compiler and not a managed one.


They’re measuring program execution time, including program startup and tear down. Languages with a more complex runtime take longer for the startup, and all seem to have roughly equally optimized that.


Some features some of those languages have:

- run bytecode - very high level - GC memory

But not all have these traits. Not sure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: