Why does a congestionless network matter here? Nagle's algorithm aggregates writes together in order to fill up a packet. But you can just do that yourself, and then you're not surprised. I find it very rare that anyone is accidentally sending partially-filled packets; they have some data and they want it to be sent now, and are instead surprised by the fact that it doesn't get sent now because their data doesn't happen to be too large to fit in a single packet. Nobody is reading a file a byte at a time and then passing that 1 byte buffer to Write on a socket. (Except... git-lfs I guess?)
Nagle's algorithm is super weird as it's saying "I'm sure the programmer did this wrong, here, let me fix it." Then the 99.99% of the time when you're not doing it wrong, the latency it introduces is too high for anything realtime. Kind of a weird tradeoff, but I'm sure it made sense to quickly fix broken telnet clients at the time.
> Nagle's algorithm aggregates writes together in order to fill up a packet.
Not quite an accurate description of Nagles algorithm. It only aggregates writes together if you already have in-flight data. The second you get back an ACK, the next packet will be sent regardless of how full it is. Equally your first write to the socket will always be sent without delay.
The case where you want to send many tiny packets with minimal latency doesn’t really make sense for TCP, because eventuality the packet overhead and traffic control algorithms will end up throttling your thought put and latency. Nagle only impact cases where you’re trying to TCP in an almost pathological manner, and elegantly handles that behaviour to minimise overheads, and associated throughput and latency costs.
If there’s a use case where latency is your absolute top priority, then you should be using UDP, and not TCP. Because TCP will always nobble your latency because it insists on ordered data delivery, and will delay just received packets if they arrive ahead of preceding packets. Only UDP gives you the ability to opt-out of that behaviour, and ensure that data is sent and received as quickly as your network allows, and lets your application decide for itself the handling of missing data.
It makes perfect sense if you consider the right abstraction. TCP connections are streams. There are no packets on that abstraction level. You’re not supposed to care about packets. You’re not supposed to know how large a packet even is.
The default is an efficient stream of bytes that has some trade-off to latency. If you care about latency, then you can set a flag.
There is no perfect abstraction. Speed matters. A stream where data is delivered ASAP is better than a stream where the data gets delayed... maybe... because the OS decides you didn't write enough data.
The default actually violates the abstraction more because now you care how large a packet is, because somehow writing a smaller amount of data causes your latency to spike for some mysterious reason.
> A stream where data is delivered ASAP is better than a stream where the data gets delayed
That depends on your situation, because as you say no abstraction is perfect. Having a stream delivered “faster” isn’t helpful if means your overhead makes up 50% of your traffic, exactly what nagle avoids.
Nagles algorithm is also pretty smart, it’s only going to delay your next packet until it’s either full, or the far end has acknowledged your preceding packet. If your got a crap ton of data to send, and you’re dumping straight into the TCP buffer, then Nagle won’t delay anything because there’s enough data to fill packets. Nagle only kicks in if you’re doing many frequent tiny writes to a TCP connection, which is rarely a valid thing to do if you care about latency and throughput, so Nagles algorithm assuming the dev has made a mistake is reasonable.
If you really care about stream latency, then UDP is your friend. Then you can completely dispense with all the traffic control processes in TCP and have stuff sent exactly when you want it sent.
In those cases it would be better to call writev() which was designed to coalesce multiple buffers into one write call.
How it sends the data is however up to the implementation, and whether it delays the last send if the TCP buffer isn't entitrely full I'm not sure - but it doesn't make sense to do so, so I would guess not.
Nagle's algorithm matters because the abstraction that TCP works on, and which was inherited by BSD Socket interface, is that of emulating a full duplex serial port.
Compare with OSI stack, where packetization is explicit at all layers and thus it wouldn't have such an issue in the first place.
Yeah it seems crazy to have that kind of hack in the entire network stack and on by default just because some interactive remote terminal clients didn't handle that behavior themselves.
Most clients that OP deals with, anyway. If your code runs exclusively in a data center, like the kind I suspect Google has, then the situation is probably reversed.
Consider the rising of mobile device. The devices that don't have a good internet is probably everywhere now.
It's no longer like 10 years ago. You either have good internet or don't have internet. The devices that have shitty network grow a lot compare to the past.
Almost every application I've written atop a TCP socket batches up writes into a buffer and then flushes out the buffer. I'd be curious to see how often this doesn't happen.
Are you replying to the correct people? I think I never mention how you should write a program. I only say that assume user have a good internet connection is a naive idea nowadays. (The gta 5 is the worst example in my opinion, lost of a few udp packets and your whole game exit to main menu. How the f**k the dev assume udp packets never lost?)
What I mean to say is that, whether or not your mobile device has bad internet or not shouldn't matter. Most applications are buffering their reads and writes. This makes TCP_NODELAY a non-issue
Most importantly buffering doesn't spend a whole bunch of CPU time context switching into the kernel. Even if you are taking advantage of Nagle's, every call to write is a syscall, which calls into the kernel to perform the write. On a mobile device this would tank your battery. This is the main reason writes are buffered in applications.
This is basically the first thing I check if diagnosing performance issues with network apps. Most probably are buffering now, but surprisingly many don't. MySQLs client library for years didn't for example (it's probably been fixed for a decade or more at this point).
If you run all of your code in one datacenter, and it never talks to the outside world, sure. That is a fairly rare usage pattern for production systems at Google, though.
Just like anyone else, we have packet drops and congestion within our backbone. We like to tell ourselves that the above is less frequent in our network than the wider internet, but it still exists.