Files
moby/daemon/libnetwork/portallocator
Albin Kerouanton 201968cc03 libnet/pa: OSAllocator: listen after bind
Move the listen syscall to the `OSAllocator` such that when
`RequestPortsInRange` returns, callers are guaranteed that the allocated
port isn't used by another process.

Bind and listen syscalls were previously split because listening before
inserting DNAT rules could cause connections to be accepted by the
kernel, so packets would never be forwarded to the container.

But, pulling them apart has an undesirable drawback: if another process
is racing against the Engine, and starts listening on the same port,
the conflict wouldn't be detected until OSAllocator's callers issue a
'listen' syscall. This means that callers need to implement their own
retry logic.

To overcome both drawbacks, set a cBPF socket filter on the socket
before it's bound, and let callers call `DetachSocketFilter` to remove
it. Now, callers are guaranteed that the port is free to use, and no
connections will be accepted prematurely.

For TCP / SCTP clients, this means that they'll send the first handshake
packet (e.g. SYN), but the kernel won't reply (e.g. SYN-ACK), and they
will retry until DNAT rules are configured or the socket filter is
removed.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2025-08-20 12:02:04 +02:00
..