The windows port mapper is needlessly complex while its job is pretty
straightforward: reserve a port through the port allocator, and start a
dummy proxy to allocate it from the OS.
The biggest source of complexity is the use of the `net.Addr` interface
to pass the host IP, port and proto. `MapRange` now has a proto arg, and
returns the allocated port.
`MapRange` is also instantiating a `mapping` struct whose fields are
all unused, except for its `stopUserlandProxy`. Instead, store
`stopProxy` callbacks directly into the `PortMapper`.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
This function is only called by New, and it takes the singleton
PortAllocator exposed by the portallocator package.
Remove this function and instantiate the PortMapper directly from New
constructor.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The portmapper struct provided by libnet/portmapper is only available
on Windows. Merge both files to reflect that.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Prior to commit 4f09af626, DeleteForwardingTableEntry had a Linux
implementation. That's not the case anymore, and it's a no-op on
Windows. Remove it.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Prior to commit 4f09af626, AppendForwardingTableEntry had a Linux
implementation. That's not the case anymore, and it's a no-op on
Windows. Remove it.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Internally a network is represented by either a libnetwork.Network or a
swarmapi.Network. The daemon functions backing the API server map
these values to the Engine API network.Inspect type on demand. Since
they have to convert, the functions to get a list of networks have to
loop over the slice of Networks and append them to a slice of
network.Inspect values.
The function used to filter the list of networks by a user-supplied
predicate takes a []network.Inspect and returns a shorter slice.
Therefore the daemon functions backing the API server have to loop
through the list twice: once to convert, and again inside the
FilterNetworks function to delete networks from the slice which do not
match the filter predicate. Each time an item is deleted from a slice,
all items at higher indices need to be copied to lower indices in the
backing array to close the hole.
Replace FilterNetworks with a function that accepts a single
interface-valued network and returns a boolean. Amend libnetwork.Network
and write a thin adapter for swarmapi.Network so both implement the
aforementioned interface. The daemon functions can thus filter networks
before projecting the values into API structs, and can completely skip
over non-matching networks, which cuts down on a nontrivial amount of
copying.
Split the validation of the filter predicate from filter evaluation to
both make it more ergonomic to use inside loops, and to make invalid
states (a filter with an ill-formed predicate) unrepresentable.
Signed-off-by: Cory Snider <csnider@mirantis.com>
newDriver, which creates a new instance of the bridge driver, is the
only place where the driver config field is set. So there's no need to
gate access to it with a mutex.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Libnetwork passes a map[string]any to the bridge driver's Register
function. This forces the daemon to convert its configuration into a
map, and the driver to convert that map back into a struct.
This is unnecessary complexity, and makes it harder to track down where
and how bridge driver configuration fields are set.
Refactor libnetwork to let the daemon register the bridge.Configuration
directly through a new option `OptionBridgeConfig`.
The bridge driver now takes a `Configuration` param that needs no
special treatment.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
makeDriverConfig is written in such a way that it seems to support
label-based driver configuration. That is, you could hypothetically use
labels starting with `com.docker.network.driver.<driver-name>.` to
define the configuration of a driver.
These labels come from the Controller's `cfg.Labels` which are set by
the daemon through libnet's OptionLabels which takes the list of labels
set on the daemon through dockerd's --label flag, or the equivalent
daemon.json field.
However, the daemon forbids setting labels that start with
`com.docker.*`. For instance:
label com.docker.network.driver.bridge.EnableProxy=false is not allowed: the namespaces com.docker.*, io.docker.*, and org.dockerproject.* are reserved for internal use
Hence, this is dead code — remove it.
Also, makeDriverConfig is checking if the Controller's cfg field is
nil... But the Controller struct is instantiated in a single place (i.e.
NewController) and it always set that field. Drop that nil check too.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
This method is only used by the cnmallocator to allocate Swarm-scoped
network resources. Its only concrete implementation is in the ovmanager.
Other network drivers are implementing it too to adhere to the
driverapi.Driver interface, but they all return a 'not implemented'
error.
Extract this method into a separate interface, and add a dedicated
RegisterNetworkAllocator to the driver registry. Update the cnmallocator
to load 'network allocators' instead of 'drivers'.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The cnmallocator package has a map of supported network drivers which
are registered using a pkg-local driver registry. This registry is then
used to load drivers, and if they have a 'local' DataScope, they aren't
used for anything. Drivers with a 'global' DataScope are called to
allocate cluster-wide network resources.
Instantiating builtin network drivers may have unintended side-effects
(e.g. the bridge driver registers a callback that should run when
firewalld is reloaded), so libnetwork has dummy '*manager' drivers that
do nothing but carry the same Capability than the original driver they
masquerade.
Put 'local drivers' (e.g. those with DataScope 'local') into a separate
list that just contains drivers' name, and don't register them into the
cnmallocator's driver registry.
Remove all the dummy '*manager' drivers as they're not needed anymore.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Add methods to count the number of addresses in the set which have a
particular prefix. The returned counts are 128 bits wide to accommodate
sets containing more than 2**64 addresses.
Signed-off-by: Cory Snider <csnider@mirantis.com>
When generating the rules for an nftables chain, rather than collecting
rules into a slice and iterating over that, use an iterator.
Signed-off-by: Rob Murray <rob.murray@docker.com>
Add nftables.Modifier, to hold a queue of commands that can be applied
using Modifier.Apply. No updates are made to the underlying Table
until Apply is called, errors in the queue if commands are deferred
until Apply.
This has the advantages that:
- less error handling is needed in code that generates update commands
- it's transactional, without needing explicit transactions
Minor disadvantages are that it's slightly more difficult to debug updates,
as it's no longer possible to step through the call making an update to
the Table manipulation in a debugger - and errors in the command, and
errors like trying to update a nonexistent chain/set/vmap, deleting an
object that doesn't exist or creating a duplicate are not reported
until the updates are applied (but, the file/line where the rule was
added is reported).
Signed-off-by: Rob Murray <rob.murray@docker.com>
Debian 13 ships iptables-nft v1.8.11 which returns a different error
than previous versions when doing `iptables -S` for a nonexistent chain.
Older versions:
ip6tables v1.8.9 (nf_tables): chain `<chain>' in table `filter' is incompatible, use 'nft' tool.
Newer versions:
ip6tables: No chain/target/match by that name.
Bisecting iptables-nft, this change was introduced by [1] which was
released in v1.8.10.
Pick the expected error message based on iptables-nft version.
[1]: https://git.netfilter.org/iptables/commit/?id=82ccfb488eeac5507471099b9b4e6d136cc06e3b
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
When nftablesdoc tests dump the state of nftables, the argument '-y' /
'--numeric-priority' isn't used, so all priorities should be
stringified. However, there's a bug in older versions of nftables that
prevents the stringification of the 'dstnat' priority — it's currently
dumped as '-100'.
New versions fix that, and thus running these tests on Debian 13 fails
because of this discrepancy with golden files.
So, look for 'type nat hook output priority -100' and stringify the
priority to ensure compatibility across versions of nft.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Instead of setting up firewall rules directly in the routed port mapper,
we now rely on the bridge driver to handle firewall reconfiguration.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Since commit 4e246efcd, individual portmappers are responsible for
setting up firewall rules and proxying according to their needs.
This change moves the responsibility back to the bridge driver, removing
unnecessary code duplication across portmappers. For now, only the nat
portmapper takes advantage of this.
This partially reverts commit 4e246efcd.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Add two new fields to portmapperapi.PortBinding: NAT and Forwarding.
These can be used by portmappers to specify how they want their callers
(e.g. bridge driver) to reconfigure the host firewall to NAT a host
port, or allow forwarding to the container port.
If portmappers don't want to opt-in to these, they can implement their
own firewall rules, and not fill these fields.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
When a endpoint's net.IPNet is loaded from store and converted
to a netip.Addr, unmap it so that iptables rules don't contain
IPv4-mapped IPv6 addresses.
Signed-off-by: Rob Murray <rob.murray@docker.com>
The daemon started by the test-integration script needs to run without
firewalld integration to make sure that daemons started by networking
tests will handle firewalld reload without any interference (i.e.
without another daemon racing against them to recreate the iptables
chains).
Most tests are already running their own daemons, but the few that don't
and need firewalld integration are updated to start their own.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>