> And that pattern is the one that is used by GOs http libraries
I don't think that is correct. In https://news.ycombinator.com/item?id=34213383, I notice that Go's HTTP/2 library would write the HEADERS frame, the DATA frame, and the terminal HEADERS frame in 3 different syscalls. In a sample application using the Go's HTTP/2 library, a gRPC response without Nagle's algorithm would transmit 497 bytes over 6 packets, while a gRPC response with Nagle's algorithm would transmit 275 bytes over 2 packets.
With a starting point where both Nagle's algorithm and delayed ack are enabled, I guess this is the order of preference:
1. delayed ack disabled, applications do the right thing by buffering accordingly - ideal performance, but it is difficult to disable delayed ack, and it may require a lot of works to fix the applications.
2a. Nagle's algorithm disabled, applications do the right thing by buffering accordingly - almost ideal performance (may perform worse than #1 over bad connection), but it may require a lot of works to fix the applications.
2b. delayed ack disabled, real world applications - almost ideal performance (may have higher syscall overhead than #1), but it is difficult to disable delayed ack.
3. Nagle's algorithm disabled, real world application - not ideal as some applications can suffer from high packet overhead, e.g. git-lfs, and this is where we are at with Go.
4. baseline - far from ideal as many applications can suffer from high latency due to bad interaction between Nagle's algorithm and delayed ack.
I would say Go has made the right trade-off, albeit with a slight hint of "we know better than you". Going forward, it is probably cheaper for linux kernel to come up with a better API to disable delayed ack (i.e. to achieve #2b), than getting the affected applications to do the right thing by buffering accordingly (i.e. to achieve #1 or #2a). We will see how soon https://github.com/git-lfs/git-lfs/issues/5242 can be resolved.
In the mean time, #2b can actually be achieved with a "SRE approach" by patching the kernel to remove delayed ack and patching the Go library to remove the `setNoDelay` call. Something for OP to try?
I just learnt about "ip route change ROUTE quickack 1" from https://news.ycombinator.com/item?id=10662061, so we don't even need to patch the kernel. This makes 2b a really attractive option.
I don't think that is correct. In https://news.ycombinator.com/item?id=34213383, I notice that Go's HTTP/2 library would write the HEADERS frame, the DATA frame, and the terminal HEADERS frame in 3 different syscalls. In a sample application using the Go's HTTP/2 library, a gRPC response without Nagle's algorithm would transmit 497 bytes over 6 packets, while a gRPC response with Nagle's algorithm would transmit 275 bytes over 2 packets.
With a starting point where both Nagle's algorithm and delayed ack are enabled, I guess this is the order of preference:
1. delayed ack disabled, applications do the right thing by buffering accordingly - ideal performance, but it is difficult to disable delayed ack, and it may require a lot of works to fix the applications.
2a. Nagle's algorithm disabled, applications do the right thing by buffering accordingly - almost ideal performance (may perform worse than #1 over bad connection), but it may require a lot of works to fix the applications.
2b. delayed ack disabled, real world applications - almost ideal performance (may have higher syscall overhead than #1), but it is difficult to disable delayed ack.
3. Nagle's algorithm disabled, real world application - not ideal as some applications can suffer from high packet overhead, e.g. git-lfs, and this is where we are at with Go.
4. baseline - far from ideal as many applications can suffer from high latency due to bad interaction between Nagle's algorithm and delayed ack.
I would say Go has made the right trade-off, albeit with a slight hint of "we know better than you". Going forward, it is probably cheaper for linux kernel to come up with a better API to disable delayed ack (i.e. to achieve #2b), than getting the affected applications to do the right thing by buffering accordingly (i.e. to achieve #1 or #2a). We will see how soon https://github.com/git-lfs/git-lfs/issues/5242 can be resolved.
In the mean time, #2b can actually be achieved with a "SRE approach" by patching the kernel to remove delayed ack and patching the Go library to remove the `setNoDelay` call. Something for OP to try?