These are contrived benchmarks at the extreme end of things. In real world usage the difference is drowned-out by the delays of so many other things happening in order to complete a handshake and key exchange. The mildly higher performance of RSA 3072 versus RSA 4096 wasn't even a big bonus during the CPU performances we had 15 years ago.
It's roughly half as fast as 4096, which sounds bad until you realize that 3072 is already 20% as fast as 2048, 3% as fast as 1024, and 1% as fast as 512. In terms of performance tradeoff it's downright mild compared to the other steps up.
If I could waive a magic wand and get a 40-100% performance boost on a service by changing 3-4 characters (s/4096/3072/) why wouldn't I take it? (Assuming I need security go to beyond RSA 2028.)
Well, in typical use cases RSA usage is very limited (eg some operations during TLS handshake), so the 40-100% boost wouldn’t be across the board, but likely shave some milliseconds per connection.