mirror of
https://github.com/moby/moby.git
synced 2026-01-11 18:51:37 +00:00
Move the listen syscall to the `OSAllocator` such that when `RequestPortsInRange` returns, callers are guaranteed that the allocated port isn't used by another process. Bind and listen syscalls were previously split because listening before inserting DNAT rules could cause connections to be accepted by the kernel, so packets would never be forwarded to the container. But, pulling them apart has an undesirable drawback: if another process is racing against the Engine, and starts listening on the same port, the conflict wouldn't be detected until OSAllocator's callers issue a 'listen' syscall. This means that callers need to implement their own retry logic. To overcome both drawbacks, set a cBPF socket filter on the socket before it's bound, and let callers call `DetachSocketFilter` to remove it. Now, callers are guaranteed that the port is free to use, and no connections will be accepted prematurely. For TCP / SCTP clients, this means that they'll send the first handshake packet (e.g. SYN), but the kernel won't reply (e.g. SYN-ACK), and they will retry until DNAT rules are configured or the socket filter is removed. Signed-off-by: Albin Kerouanton <albinker@gmail.com>