Prior to commit b5bf89c31, all socket fds passed to the docker-proxy
were getting the O_NONBLOCK flag set. However, that commit added support
for SCTP socket-passing, and had to conditionally guard this behavior to
not use it on SCTP sockets due to ishidawataru/sctp not clearing the
flag.
A fix was made in ishidawataru/sctp (see [1]), so we can remove that
condition.
[1]: https://github.com/ishidawataru/sctp/commit/4b890084db30
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Since commit b3fabedec, the bridge driver maps ports following a 3-step
process: 1. create a socket, and bind it to the host port; 2. create
iptables rules; 3. start the userland proxy (if it's enabled). This
ensures that the port is really free before inserting iptables rules
that could otherwise disrupt host services.
However, this 3-step process wasn't implemented for SCTP, because we had
no way to instiantiate an SCTP listener from an fd. Since
github.com/ishidawataru/sctp@4719921f9, we can.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The UDP proxy is setting a deadline of 90 seconds when reading from the
backend. If no data is received within this interval, it reclaims the
connection.
This means, the backend would see a different connection every 90
seconds if the backend never sends back any reply to a client.
This change prevents the proxy from eagerly GC'ing such connections by
taking into account the last time a datagram was proxyed to the backend.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The UDP proxy used by cmd/docker-proxy is executing Write and Close in
two separate goroutines, such that a Close could interrupt an in-flight
Write.
Introduce a `connTrackEntry` that wraps a `net.Conn` and a `sync.Mutex`
to ensure that Write and Close are serialized.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
When a UDP server is running on a multihomed server, as is the case with
pretty much _all_ Docker hosts (eg. eth0 + docker0), the kernel has to
choose which source address is used when replying to a UDP client. But
that process is based on heuristics and is fallible.
If the address picked doesn't match the original destination address
used by the client, it'll drop the datagram and return an ICMP Port
Unreachable.
To prevent that, we need to:
- `setsockopt(IP_PKTINFO)` on proxy's sockets.
- Extract the original destination address from an ancillary message
every time a new 'UDP connection' is 'established' (ie. every time we
insert a new entry into the UDP conntrack table).
- And finally, pass a control message containing the desired source
address to the kernel, every time we send a response back to the
client.
Also, update the inline comment on read errors in `(*UDPProxy).Run()`.
This comment was misleadingly referencing ECONNREFUSED - Linux's UDP
implementation never returns this error (see [1]). Instead, state why
`net.ErrClosed` is perfectly fine and doesn't need to be logged
(although, docker-proxy currently logs to nowhere).
[1]: https://github.com/search?q=repo%3Atorvalds%2Flinux+ECONNREFUSED+path%3A%2F%5Enet%5C%2F%28ipv4%7Cipv6%29%5C%2F%28udp%7Ctcp%29%2F&type=code
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The daemon was modified to tell RootlessKit about host port
mappings directly, rather than by running rootlesskit-docker-proxy
to make those updates.
DNAT rules created in rootless mode referred to the host IP address,
rather than the address seen as host address in the rootless network
namespace.
With these changes, port mappings work in rootless mode when
--userland-proxy=false - so, don't gate the RootlessKit API calls
on starting docker-proxy.
Signed-off-by: Rob Murray <rob.murray@docker.com>
In preparation for the daemon passing a listen fd, add command line
option -use-listen-fd to indicate that the fd is present (as fd 4).
If the new flag isn't given, open the listener as normal.
Refactor the TCP and UDP proxies to be constructed with an existing
TCPListener or UDPConn, respectively. Lift the responsibilty of opening
the listener to the entrypoint. Per the Single Responsibility Principle,
this structure affords changing how the listener is created without
having to touch the proxy implementations.
Co-authored-by: Cory Snider <csnider@mirantis.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Perhaps the testutils package in the past had an `init()` function to set up
specific things, but it no longer has. so these imports were doing nothing.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Since this command is part of the official distribution and even
required for tests, let's move this up to the main cmd's.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>