Previous commit changed the OSAllocator to listen after binding a port,
such that we're 100% sure that the port is free. We can now make the
OSAllocator responsible for retrying port allocations when it tries to
find an ephemeral port, or a free port in a range.
Move the retry logic from the 'nat' portmapper to the OSAllocator.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Move the listen syscall to the `OSAllocator` such that when
`RequestPortsInRange` returns, callers are guaranteed that the allocated
port isn't used by another process.
Bind and listen syscalls were previously split because listening before
inserting DNAT rules could cause connections to be accepted by the
kernel, so packets would never be forwarded to the container.
But, pulling them apart has an undesirable drawback: if another process
is racing against the Engine, and starts listening on the same port,
the conflict wouldn't be detected until OSAllocator's callers issue a
'listen' syscall. This means that callers need to implement their own
retry logic.
To overcome both drawbacks, set a cBPF socket filter on the socket
before it's bound, and let callers call `DetachSocketFilter` to remove
it. Now, callers are guaranteed that the port is free to use, and no
connections will be accepted prematurely.
For TCP / SCTP clients, this means that they'll send the first handshake
packet (e.g. SYN), but the kernel won't reply (e.g. SYN-ACK), and they
will retry until DNAT rules are configured or the socket filter is
removed.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
- Use a subdirectory for all files used in the test
- Add a .golden file-extension for easier discovery of generated files
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The only references to blockLen type-assert the uint32 constant to other
widths. Make it an untyped int to cut down on unnecessary boilerplate.
Rewrite the genNumBlocks utility function to use the well-known
algorithm for rounding-up integer division instead of branching. Inline
it into the only call site.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The `TransporPort` type is comparable; it doesn't have fields that
require special handling. It's defined as;
// TransportPort represents a local Layer 4 endpoint
type TransportPort struct {
Proto Protocol
Port uint16
}
where `Protocol` is an int (with a stringer interface);
type Protocol uint8
So we can remove the `Equal` method, and simplify places where it's
compared.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The `GetCopy()` function doesn't de-reference anything, as it's
all a straight copy. We can remove it as it's only making things
more complicated than needed.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Modernize using maps.Clone, slices.Clone. This method is needed to
satisfy the datastore.KVObject interface, so also assert it does.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The endpointJoinInfo.CopyTo function expected the caller to construct an
EndpointInterface to copy to, but all callsites created an empty struct.
In addition, `CopyTo` would never return an error, so the error return
was redundant.
Replace it with a `Copy()` function, which makes it easier to
consume.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The EndpointInterface.CopyTo function expected the caller to construct an
EndpointInterface to copy to, but all callsites created an empty struct.
In addition, `CopyTo` would never return an error, so the error return
was redundant.
Replace it with a `Copy()` function, which makes it easier to
consume.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Change `PortBinding.Equal` to use a value receiver and parameter, this
allows us to use it directly with `slices.IndexFunc`, `DeleteFunc`,
without having to add a wrapper func.
The only exception currently is the `UnmapPorts` function (stub), which
takes portmapperapi.PortBinding as argument; the portmapperapi.PortBinding
type embeds `types.PortBinding`, and it's the only field that's compared
as part of `UnmapPorts`
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
- Rename `PortBinding.GetCopy()` to `PortBinding.Copy()`, which is more
idiomatic, and aligns with other similar methods.
- Change it to a non-pointer receiver; `Copy` does not mutate state, and
the type should still be reasonably small.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Rewrite both to use slices.Clone, and GetIPNetCanonical to not depend
on GetIPNetCopy. GetIPNetCopy only has a single consumer, so we should
consider moving it local to where it's used.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The IpamInfo.CopyTo function expected the caller to construct an
IpamInfo to copy to, but all callsites created an empty struct.
In addition, `CopyTo` would never return an error, so the error
return was redundant.
Replace it with a `Copy()` function, which makes it easier to
consume.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The IpamConf.CopyTo function expected the caller to construct an
IpamConf to copy to, but all callsites created an empty struct.
In addition, `CopyTo` would never return an error, so the error
return was redundant.
Replace it with a `Copy()` function, which makes it easier to
consume.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
For nftables only, never enable IP forwarding on the host. Instead,
return an error on network creation if forwarding is not enabled,
required by a bridge network, and --ip-forward=true.
If IPv4 forwarding is not enabled when the daemon is started with
nftables enabled and other config at defaults, the daemon will
exit when it tries to create the default bridge.
Otherwise, network creation will fail with an error if IPv4/IPv6
forwarding is not enabled when a network is created with IPv4/IPv6.
It's the user's responsibility to configure and secure their host
when they run Docker with nftables.
Signed-off-by: Rob Murray <rob.murray@docker.com>
The userland proxy uses unconnected UDP sockets to receive packets from
anywhere, so enabling SO_REUSEADDR means that multiple sockets can bind
the same port. This defeats the purpose of the portallocator, which is
supposed to ensure that the port is free and not already in use (either
by us, or by another process). So, do not enable SO_REUSEADDR for UDP
sockets.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
If unmarshaling the peer record fails, there is no need to check if it's
a record for a local peer. Attempting to do so anyway will result in a
nil-dereference panic. Don't do that.
The Windows overlay driver has a typo: prevPeer is being checked twice
for whether it was a local-peer record. Check prevPeer once and newPeer
once each, as intended.
Signed-off-by: Cory Snider <csnider@mirantis.com>
- remove/rename named error-return
- remove redundant defer
- use "continue" to reduce nesting
- use structured logs
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
handleFirewalldReloadNw locks `d.mu` and then `d.configNetworks`.
However, the rest of the driver locks `d.configNetworks` first and then
`d.mu`.
This could result in deadlocks if `handleFirewalldReloadNw` is called
while the bridge driver is already holding `d.configNetworks` lock.
Other code paths were checked to ensure that they all follow the same
locking order.
This bug was introduced by commit a527e5a.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The bridge driver was embedding `sync.Mutex` which is unconventional and
makes it harder to analyze locks ordering.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Define a `RouteType` type, type the related consts, and update the
`JoinInfo.AddStaticRoute` signature in the interface.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These use the Linux-specific values as convention, so unfortunately,
the syscall package doesn't define consts for these on Windows, so
keeping our own definition (values are not really relevant here).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Define a type to help discovery, and update the signatures of
`ResolveName`, `Network.ResolveName`, and `Sandbox.ResolveName`
accordingly.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Modernize the field and allow using it as-is in some places, or
convert it to a string (which won't produce an error down the line).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This package is a wrapper for the libnetwork/internal/resolvconf package,
which is a modernized, more performant rewrite of the original parsing
code.
The libnetwork/resolvconf package was still maintained because it was
used by BuildKit, but since [moby/buildkit@3d43066], BuildKit maintains
its own copy of the internal package.
The only remaining uses of this package was as part of some tests (which
would also benefit of using the internal pacakge's implementation directly),
and a _single_ use of `resolvconf.Path` in the daemon, which cannot use
the internal package currently because it's internal to libnetwork.
This patch:
- Removes all functions that were not used.
- Rewrites some tests in libnetwork to use the internal/resolvconf package
directly, instead of depending on the wrapper.
- Add TODOs to consider moving the "Path" function separate (which could
be in daemon/config if we consider it to be the default for the daemon's
resolvconf path configuration).
[moby/buildkit@3d43066]: 3d43066f2e
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>