Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think personally the answer is "basically no", Rust, C and C++ are all the same kind of low-level languages with the same kind of compiler backends and optimizations, any performance thing you could do in one you can basically do in the other two.

However, in the spirit of the question: someone mentioned the stricter aliasing rules, that one does come to mind on Rust's side over C/C++. On the other hand, signed integer overflow being UB would count for C/C++ (in general: all the UB in C/C++ not present in Rust is there for performance reasons).

Another thing I thought of in Rust and C++s favor is generics. For instance, in C, qsort() takes a function pointer for the comparison function, in Rust and C++, the standard library sorting functions are templated on the comparison function. This means it's much easier for the compiler to specialize the sorting function, inline the comparisons and optimize around it. I don't know if C compilers specialize qsort() based on comparison function this way. They might, but it's certainly a lot more to ask of the compiler, and I would argue there are probably many cases like this where C++ and Rust can outperform C because of their much more powerful facilities for specialization.





I agree with this whole-heartedly. Rust is a LANGUAGE and C is a LANGUAGE. They are used to describe behaviours. When you COMPILE and then RUN them you can measure speed, but that's dependent on two additional bits that are not intrinsically part of the languages themselves.

Now: the languages may expose patterns that a compiler can make use of to improve optimizations. That IS interesting, but it is not a question of speed. It is a question of expressability.


No. As you've made clear, it's a question of being able to express things in a way that gives more information to a compiler, allowing it to create executables that run faster.

Saying that a language is about "expressability" is obvious. A language is nothing other than a form of expression; no more, no less.


Yes. But the speed is dependent on whether or not the compiler makes use of that information and the machine architecture the compiler is running it on.

Speed is a function of all three -- not just the language.

Optimizations for one architecture can lead to perverse behaviours on another (think cache misses and memory layout -- even PROGRAM layout can affect speed).

These things are out of scope of the language and as engineers I think we ought to aim to be a bit more precise. At a coarse level I can understand and even would agree with something like "Python is slower than C", but the same argument applies there as well.

But at some point objectivity ought to enter the playing field.


> ... it's a question of being able to express things in a way that gives more information to a compiler, allowing it to create executables that run faster.

There is expressing idea via code, and there is optimization of code. They are different. Writing what one may think is "fully optimized code" the first time is a mistake, every time, and usually not possible for a codebase of any significant size unless you're a one-in-a-billion savant.

Programming languages, like all languages, are expressive, but only as expressive as the author wants to be, or knows how to be. Rarely does one write code and think "if I'm not expressive enough in a way the compiler understands, my code might be slightly slower! Can't have that!"

No, people write code that they think is correct, compile it, and run it. If your goal is to make the most perfect code you possibly can, instead of the 95% solution is the robust, reliable, maintainable, and testable, you're doing it wrong.

Rust is starting to take up the same mental headspace as LLMs: they're both neat tools. That's it. I don't even mind people being excited about neat tools, because they're neat. The blinders about LLMs/Rust being silver bullets for the software industry need to go. They're just tools.


>in Rust and C++, the standard library sorting functions are templated on the comparison function. This means it's much easier for the compiler to specialize the sorting function, inline the comparisons and optimize around it.

I think this is something of a myth. Typically, a C compiler can't inline the comparison function passed to qsort because libc is dynamically linked (so the code for qsort isn't available). But if you statically link libc and have LTO, or if you just paste the implementation of qsort into your module, then a compiler can inline qsort's comparison function just as easily as a C++ compiler can inline the comparator passed to std::sort. As for type-specific optimizations, these can generally be done just as well for a (void *) that's been cast to a T as they can be for a T (though you do miss out on the possibility of passing by value).

That said, I think there is an indirect connection between a templated sort function and the ability to inline: it forces a compiler/linker architecture where the source code of the sort function is available to the compiler when it's generating code for calls to that function.


qsort is obviously just an example, this situation applies to anything that takes a callback: in C++/Rust, that's almost always generic and the compiler will monomorphize the function and optimize around it, and in C it's almost always a function pointer and a userData argument for state passed on the stack. (and, of course, it applies not just to callbacks, but more broadly to anything templated).

I'm actually very curious about how good C compilers are at specializing situations like this, I don't actually know. In the vast majority cases, the C compiler will not have access to the code (either because of dynamic linking like in this example, or because the definition is in another translation unit), but what if it does? Either with static linking and LTO, or because the function is marked "inline" in a header? Will C compilers specialize as aggressively as Rust and C++ are forced to do?

If anyone has any resources that have looked into this, I would be curious to hear about it.


If you choose to put a boundary in your code that makes it span over several binaries, so that they can be swapped out at runtime, no compiler in any language can optimize that away, because that would be against the interface you explicitly chose. That's what dynamic linking aka. runtime linking is in C.

This is not an issue for libc, because the behaviour of that is not specified by the code itself, but by the spec, which is why C compilers can and do completely remove or change calls to libc, much to the distress of someone expecting a portable assembler.


Dynamic linking will inhibit inlining entirely, and so yes qsort does not in practice get inlined if libc is dynamically linked. However, compilers can inline definitions across translation units without much of any issue if whole program optimization is enabled.

The use of function pointers doesn't have much of an impact on inlining. If the argument supplied as a parameter is known at compile time then the compiler has no issue performing the direct substitution whether it's a function pointer or otherwise.


My point is that the real issue is just whether or not the function call is compiled as part of the same unit as the function. If it is, then, certainly, modern C compilers can inline functions called via function pointers. The inlining itself is not made easier via the template magic.

Your C comparator function is already “monomirphized” - it’s just not type safe.


Wouldn't C++ and Rust eventually call down into those same libc functions?

I guess for your example, qsort() it is optional, and you can chose another implementation of that. Though I tend to find that both standard libraries tend to just delegate those lowest level calls to the posix API.


Rust doesn't call into libc for sort, it has its own implementation in the standard library.

Obviously. How about more complex things like multi-threading APIs though? Can the Rust compiler determine that the subject program doesn't need TLS and produce a binary that doesn't set it up at all, for example?

Optimising out TLS isn't going to be a good example of compiler capability. Whether another thread exists is a global property of a process, and beyond that the system that process operates in.

The compiler isn't going to know for instance that an LD_PRELOAD variable won't be set that would create a thread.


> Whether another thread exists is a global property of a process, and beyond that the system that process operates in.

TLS is a language feature. Whether another thread exists doesn't mean it has to use the same facilities as the main program.

> The compiler isn't going to know for instance that an LD_PRELOAD variable won't be set that would create a thread.

Say the program is not dynamically linked. Still no?


> Say the program is not dynamically linked. Still no?

Whether the program has dynamic dependencies does not dictate whether a thread can be created, that's a property of the OS. Windows has CreateRemoteThread, and I'd be shocked if similar capabilities didn't exist elsewhere.

If I mark something as thread-local, I want it to be thread-local.


I mean, it’s not that obvious, your parent asked about it directly, and you could easily imagine calling it libc for this.

I beehive the answer to your question is “yes” because no-std binaries can be mere bytes in size, but I suspect that more complex programs will almost always have some dependency somewhere (possibly even the standard library, but I don’t know offhand) that uses TLS somewhere in it.


Many of the libc functions are bad apis with traditionally bad implementations.

There was a contest for which language the fastest tokenizer could be written in. I entered my naive 15 minutes Rust version and got second place among roughly 30 entries. First place was hand-crafted assembly.

I am not saying Rust is faster always. But it can be a damn performant language even if you don't think about performance too deeply or don't twist yourself into bretzels to write performant code.

And in my book that counts for something. Because yes, I want my code to be performant, but I'd also not have it blow up on edge cases, have a way to express limitations (like a type system) and have it testable. Rust is pretty good even if you ignore the hype. I write audio DSP code on embedded devices with a strict deadline in C++. I plan to explore Rust for this, especially now since more and more embedded devices start to have more than one processor core.


> On the other hand, signed integer overflow being UB would count for C/C++

C and C++ don't actually have an advantage here because this is only limited to signed integers unless you use compiler-specific intrinsics. Rust's standard library allows you to make overflow on any specific arithmetic operation UB on both signed and unsigned integers.


It's interesting, because it's a "cultural" thing like the author discusses, it's a very good point. Sure, you can do unsafe integer arithmetic in Rust. And you can do safe integer arithmetic with overflow in C/C++. But in both cases, do you? Probably you don't in either case.

"Culturally", C/C++ has opted for "unsafe-but-high-perf" everywhere, and Rust has "safe-but-slightly-lower-perf" everywhere, and you have to go out of your way to do it differently. Similarly with Zig and memory allocators: sure, you can do "dynamically dispatched stateful allocators that you pass to every function that allocates" in C, but do you? No, you probably don't, you probably just use malloc().

On the other hand: the author's point that the "culture of safety" and the borrow checker in Rust frees your hand to try some things in Rust which you might not in C/C++, and that leads to higher perf. I think that's very true in many cases.

Again, the answer is more or less "basically no, all these languages are as fast as each other", but the interesting nuance is in what is natural to do as an experienced programmer in them.


C++ isn't always "unsafe-but-high-perf". Move semantics are a good example. The spec goes to great lengths to ensure safety in a huge number of scenarios, at the cost of performance. Mostly shows up in two ways: one, unnecessary destructor calls on moved out objects, and two, allowing throwing exceptions in move constructors which prevents most optimizations that would be enabled by having move constructors in the first place (there was an article here recently on this topic).

Another one is std::shared_ptr. It always uses atomic operations for reference counting and there's no way to disable that behavior or any alternative to use when you don't need thread safety. On the other hand, Rust has both non-atomic Rc and atomic Arc.


> one, unnecessary destructor calls on moved out objects

That issue predates move semantics by ages. The language always had very simple object life times, if you create Foo foo; it will call foo.~Foo() for you, even if you called ~Foo() before. Anything with more complex lifetimes either uses new or placement new behind the scenes.

> Another one is std::shared_ptr.

From what I understand shared_ptr doesn't care that much about performance because anyone using it to manage individual allocations already decided to take performance behind the shed to be shot, so they focused more on making it flexible.


C++11 totally could have started skipping destructors for moved out values only. They chose not to, and part of the reason was safety.

I don't agree with you about shared_ptr (it's very common to use it for a small number of large/collective allocations), but even if what you say is true, it's still a part of C++ that focuses on safety and ignores performance.

Bottom line - C++ isn't always "unsafe-but-high-perf".


The rust standard library does make targeted use of unchecked arithmetic when the containing type can ensure that that overflow never happens and benchmarks have shown that it benefits performance. E.g. in various iterator implementations. Which means the unsafe code has to be written and encapsulated once, users can now use safe for loops and still get that performance benefit.

The main performance difference between Rust, C, and C++ is the level of effort required to achieve it. Differences in level of effort between these languages will vary with both the type of code and the context.

It is an argument about economics. I can write C that is as fast as C++. This requires many times more code that takes longer to write and longer to debug. While the results may be the same, I get far better performance from C++ per unit cost. Budgets of time and money ultimately determine the relative performance of software that actually ships, not the choice of language per se.

I've done parallel C++ and Rust implementations of code. At least for the kind of performance-engineered software I write, the "unit cost of performance" in Rust is much better than C but still worse than C++. These relative costs depend on the kind of software you write.


I like this post. It is well-balanced. Unfortunatley, we don't see enough of this in discussions of Rust vs $lang. Can you share a specific example of where the "unit cost of performance" in Rust is worse than C++?

> I can write C that is as fast as C++.

Only if ignoring the C++ compile time execution capabilites.


C++ compile time execution is just a gimmicky code generator, you can do it in any language.

Yeah, I could also be writting in a macro assembler for some Lisp inspired ideas and optimal performace.

Any code that can be generated at compile-time can be written the old fashioned way.

Including using a macro assembler with a bunch MASM/TASM like clever macros.

> I can write C that is as fast as C++

I generally agree with your take, but I don't think C is in the same league as Rust or C++. C has absolutely terrible expressivity, you can't even have proper generic data structures. And something like small string optimization that is in standard C++ is basically impossible in C - it's not an effort question, it's a question of "are you even writing code, or assembly".


Yes, it is the difference between "in theory" and "in practice". In practice, almost no one would write the C required to keep up with the expressiveness of modern C++. The difference in effort is too large to be worth even considering. It is why I stopped using C for most things.

There is a similar argument around using "unsafe" in Rust. You need to use a lot of it in some cases to maintain performance parity with C++. Achievable in theory but a code base written in this way is probably going to be a poor experience for maintainers.

Each of these languages has a "happy path" of applications where differences in expressivity will not have a material impact on the software produced. C has a tiny "happy path" compared to the other two.


Also in theory, one could be using a static analyser all the time as a C or C++ build step.

Lint is part of UNIX toolset since 1979, and we have modern versions freely available like clang tidy.

In practice, many devs keep thinking they know better.


> On the other hand, signed integer overflow being UB would count for C/C++

Rust defaults to the platform treatment of overflows. So it should only make any difference if the compiler is using it to optimize your code, what will most likely lead to unintended behavior.


Rust's overflow behavior isn't platform-dependent. By default, Rust panics on overflow when compiled in debug mode and wraps on overflow when compiled in release mode, and either behavior can be selected in either mode by a compiler flag. In neither case does Rust consider it UB for arithmetic operations to wrap.

Writing a function with UB for overflows doesn't cause unintended behavior if you're doing it to signal there aren't any overflows. And it's very important because it's needed to do basically any loop rewriting.

On the other hand, writing a function that recovers from overflows in an incorrect/useless way still isn't helpful if there are overflows.


This is a tangent, because it clearly didn’t pan out, but I had hope for rust having an edge when I learned about how all objects are known to be immutable or not. This means all the mutable objects can be held together, as well as the immutable, and we’d have more efficient use of the cache: memory writes to mutable objects share the cache with other mutable objects, not immutable Objects, and the bandwidth isn’t wasted on writing back bytes of immutable objects that will never change.

As I don’t see any reason rust would be limited in runtime execution compared to c, I was hoping for this proving an edge.

Apparently not a big of an effect as I hoped.


I think it would be quite difficult to actually arrange the memory layout to take advantage of this in a useful way. Mutable/immutable is very context-dependent in rust.

Rust doesn't have immutable memory, only access restrictions. An exclusive owner of an object can always mutate it, or can lend temporary read-only access to it. So the same memory may flip between exclusive-write and shared-read back and forth.

It's an interesting optimization, but not something that could be done directly.


> For instance, in C, qsort() takes a function pointer for the comparison function, in Rust and C++, the standard library sorting functions are templated on the comparison function.

That's more of a critique of the standard libraries than the languages themselves.

If someone were writing C and cared, they could provide their own implementation of sort such that the callback could be inlined (LLVM can inline indirect calls when all call sites are known), just as it would be with C++'s std::sort.

Further, if the libc allows for LTO (active area of research with llvm-libc), it should be possible to optimize calls to qsort this way.


"could" and "should" are doing some very theoretical heavy lifting here.

Sure, at the limit, I agree with you, but in reality, relying on the compiler to do any optimization that you care about (such as inlining an indirect function call in a hot loop) is incredibly unwise. Invariably, in some cases it will fail, and it will fail silently. If you're writing performance critical code in any language, you give the compiler no choice in the matter, and do the optimization yourself.

I do generally agree that in the case of qsort, it's an API design flaw


> qsort, it's an API design flaw

It's just a generic sorting function. If you need more you're supposed to write it yourself. The C standard library exists for convenience not performance.


Fair point.

> That's more of a critique of the standard libraries than the languages themselves.

But we're right to criticise the standard libraries. If the average programmer uses standard libraries, then the average program will be affected (positively and negatively) by its performance and quirks.


I’m not sure about the other UB opportunities, but in idiomatic rust code this just doesn’t come up.

In C, you frequently write for loops with signed integer counters for the compiler to realize the loop must hit the condition. In Rust you write for..each loops or invoke heavily inlined functional operators. It ends up all lowering to the same assembly. C++ is the worst here because size_t is everywhere in the standard library so you usually end up using size_t for the loop counter, negating the ability for the compiler to exploit UB.


>signed integer overflow being UB would count for C/C++

Then, I raise you to Zig which has unsigned integer overflow being UB.


Interestingly enough, Zig does not use the same terminology as C/C++/Rust do here. Zig has "illegal behavior," which is either "safety checked" or "unchecked." Unchecked illegal behavior is like undefined behavior. Compiler flags and in-source annotations can change the semantics from checked to unchecked or vice versa.

Anyway that's a long way of saying that you're right, integer overflow is illegal behavior, I just think it's interesting.



You're qsort example is basically the same reason people say C++ is faster than Rust. C++ templates are still a lot more powerful than Rusts systems but that's getting closer and closer every day.

It is?? Can you give some examples of high performance stuff you can do using C++'s template system that you can't do in rust?

They are likely referring to the scope of fine-grained specialization and compile-time codegen that is possible in modern C++ via template metaprogramming. Some types of complex optimizations common in C++ are not really expressible in Rust because the generics and compile-time facilities are significantly more limited.

As with C, there is nothing preventing anyone from writing all of that generated code by hand. It is just far more work and much less maintainable than e.g. using C++20. In practice, few people have the time or patience to generate this code manually so it doesn't get written.

Effective optimization at scale is difficult without strong metaprogramming capabilities. This is an area of real strength for C++ compared to other systems languages.


Again, can you provide an example or two? Its hard to agree or disagree without an example.

I think all C++ wild template stuff can be done via proc macros. Eg, in rust you can add #[derive(Serialize, Deserialize)] to have a highly performant JSON parser & serializer. And thats just lovely. But I might be wrong? And maybe its ugly? Its hard to tell without real examples.


Rust doesn't allow specialization and likely never will because it's unsound https://www.reddit.com/r/rust/comments/1p346th/specializatio... has a couple of nice comments about it.

But yes it's basically

template <typename T, size_t N> class Example { vector<T> generic; };

template<> class Example<int32_t, 32> { int bitpackinhere; }


Specialization isn’t stable in Rust, but is possible with C++ templates. It’s used in the standard library for performance reasons. But it’s not clear if it’ll ever land for users.

> As with C, there is nothing preventing anyone from writing all of that generated code by hand. It is just far more work and much less maintainable than e.g. using C++20.

It's also still less elegant, but compile time codegen for specialisation is part of the language (build system?) with build.rs & macros. serde makes strong use of this to generate its serialisation/deserialisation code.


And compile time execution.

With C you only have macro soup and the hope the compiler might optimise some code during compilation into some kind of constant values.

With C++ and Rust you're sure that happens.


Rust has linker optimizations that can make it faster in some cases

Huh? Both have LTO. There are linker optimizations available to Rust and not to C and C++. They all use the same God damn linker.

A few years ago I pulled a rust library into a swift app on ios via static linking & C FFI. And I had a tiny bit of C code bridge the languages together.

When I compiled the final binary, I ran llvm LTO across all 3 languages. That was incredibly cool.


At that point the real question should be restated. Does the LLVM IL that is generated from clang and rustc matter in a meaningful way?

Strict aliasing analysis of rust will provide some fundamental better optimization than C.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: