Lambda is designed to be a zero-overhead feature. In the worst case, it's the same as passing an additional void* ctx argument to a plain old function, which is the most common pattern in C/pre-lambda C++.
But it's not. A lambda is merely syntactic sugar for an object (a functional) and as such it can carry a state (the term is 'closure' in this case); depending on what is being captured, its creation can even involve a heap allocation.
It is: "zero-overhead" is defined to mean you could not open-code it yourself any better.
If you specify a capture-by-copy, you have specified a copy. If that copy involves an allocation, you have specified an allocation. There is exactly zero extra overhead: in the overwhelmingly most common uses, not even a call through a pointer.
The good news is that the compiler knows all about lambdas, so can optimize the hell out of them.
By "careful" I mean avoiding unnecessary use or the temptation of capturing "everything." All too often I see people get enamored with it or fall into the trap of following patterns popular in other, slower, languages.
Not sure I get your point - the exact same is true of a C function which is passed a void* ctx object, since those objects have to be allocated and their lifecycle has to be managed. It's still a zero overhead feature.
A pointer argument passed in a register is not what we mean when we say "overhead".
Without the lambda, you would instead need to pass the pointer manually, for identically the same cost. But then the compiler would not understand as well what you were doing, and would be unable to optimize it as well. Here, the lambda gives you negative overhead, vs. what you would have written.