InTheForest/moby

mirror of https://github.com/moby/moby.git synced 2026-01-11 18:51:37 +00:00

Author	SHA1	Message	Date
Sebastiaan van Stijn	ed1406cb93	Merge pull request #50236 from corhere/libn/overlay-really-delete-neigh libn/d/overlay: delete FDB entry from AF_BRIDGE	2025-06-24 18:13:54 +02:00
Albin Kerouanton	a41225dcfd	Merge pull request #50091 from corhere/libn/overlay-refactor-checkencryption libnetwork/d/overlay: simplify the encryption logic	2025-06-23 21:32:02 +02:00
Sebastiaan van Stijn	6025adfbef	Merge pull request #50226 from robmry/driver_api_optional_extconn libnet: split ProgramExternalConnectivity/RevokeExternalConnectivity out of driverapi	2025-06-20 13:31:18 +02:00
Cory Snider	7a12bbe5d3	libn/d/overlay: delete FDB entry from AF_BRIDGE Starting with commit `0d6e7cd983` DeleteNeighbor() needs to be called with the same options as the AddNeighbor() call that created the neighbor entry. The calls in peerdb were modified incorrectly, resulting in the deletes failing and leaking neighbor entries. Fix up the DeleteNeighbor calls so that the FDB entry is deleted from the FDB instead of the neighbor table, and the neighbor is deleted from the neighbor table instead of the FDB. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-06-19 17:24:16 -04:00
Cory Snider	a7f01d238e	libnetwork: fix flaky Swarm service DNS When libnetwork receives a watch event for a driver table entry from NetworkDB it passes the event along to the interested driver. This code contains a subtle bug: update events from NetworkDB are passed along to the driver as Delete events! This bug was lying dormant as driver-table entries can only be added by the driver, not updated. Now that NetworkDB broadcasts an UpdateEvent to watchers if the entry is already known to the local NetworkDB, irrespective of whether the event received from the remote peer was a CREATE or UPDATE event, the bug is causing problems. Whenever a remote node replaces an entry in the overlay_peer_table but the intermediate delete state was not received by the local node, the new CREATE event would be translated to an UpdateEvent by NetworkDB and subsequently handled by the overlay driver as if the entry was deleted! Bubble table UPDATE events up to the network driver as Update events. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-06-18 17:58:08 -04:00
Cory Snider	de24c536b0	Merge pull request #50193 from corhere/libn/networkdb-fix-crudtable-flakes-harder libnetwork/networkdb: prioritize local table broadcasts over event rebroadcasts	2025-06-18 13:34:57 -04:00
Rob Murray	bf002e51a0	Split [Program\|Revoke]ExternalConnectivity out of libnet driverapi Signed-off-by: Rob Murray <rob.murray@docker.com>	2025-06-18 17:38:35 +01:00
Rob Murray	b387526fce	Merge pull request #49981 from robmry/drop_inc_rules Drop DOCKER-ISOLATION rules	2025-06-17 20:22:00 +01:00
Rob Murray	ec185e57cf	Test Nftabler params Signed-off-by: Rob Murray <rob.murray@docker.com>	2025-06-17 18:28:20 +01:00
Rob Murray	126f99d776	Add a way to undo nftables.Enable(), for unit tests Signed-off-by: Rob Murray <rob.murray@docker.com>	2025-06-17 09:52:48 +01:00
Rob Murray	c66abe486b	nftabler: add mirrored WSL2 loopback0 workaround Signed-off-by: Rob Murray <rob.murray@docker.com>	2025-06-16 15:19:47 +01:00
Rob Murray	d31956b2f7	Add an outline nftabler Signed-off-by: Rob Murray <rob.murray@docker.com>	2025-06-16 15:19:45 +01:00
Rob Murray	1ad9599da7	Drop DOCKER-ISOLATION rules The Inter-Network Communication rules in the iptables chains DOCKER-ISOLATION-STAGE-1 / DOCKER-ISOLATION-STAGE-2 (which are called from filter-FORWARD) currently: - Block access from containers in one bridge network, to ports published to host addresses by containers in other bridge networks, when the userland-proxy is disabled. - But, that access is allowed when the proxy is enabled. - Block access to all ports on container addresses in gateway mode "nat-unprotected" networks. - But, those ports can be accessed from anywhere else, including other hosts. Just not other bridge networks. - Allow access from containers in "nat" bridge networks to published ports on container addresses in "routed" networks. But, to do that, extra INC rules are added for the routed network. The INC rules are no longer needed to block access from containers in one network to unpublished ports on container addresses in other networks. Direct routing to containers in NAT networks is blocked by the "raw-PREROUTING" rules that block access from untrusted interfaces (all interfaces apart from the network's own bridge). Drop these INC rules to resolve the inconsistencies listed above, with this change: - Published ports on host addresses can be accessed from containers in other networks (even without the userland-proxy). - The rules for direct routing between bridge networks are the same as the rules for direct routing from outside the Docker host (allowed for gw modes "routed" and "nat-unprotected", disallowed for "nat"). Fewer rules, so it's simpler, and perhaps slightly faster. Internal networks (with no access to networks outside the host) are also implemented using rules in the DOCKER-ISOLATION chains. This change moves those rules to a new chain, DOCKER-INTERNAL, and drops the DOCKER-ISOLATION chains. Signed-off-by: Rob Murray <rob.murray@docker.com>	2025-06-16 14:54:31 +01:00
Cory Snider	6ec6e0991a	libnetwork/networkdb: prioritize local broadcasts A network node is responsible for both broadcasting table events for entries it owns and for rebroadcasting table events from other nodes it has received. Table events to be broadcast are added to a single queue per network, including events for rebroadcasting. As the memberlist TransmitLimitedQueue is (to a first approximation) LIFO, a flood of events from other nodes could delay the broadcasting of locally-generated events indefinitely. Prioritize broadcasting local events by splitting up the queues and only pulling from the rebroadcast queue if there is free space in the gossip packet after draining the local-broadcast queue. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-06-13 17:08:18 -04:00
Cory Snider	e9a7154909	libnetwork/networkdb: improve TestCRUDTableEntries Log more details when assertions fail to provide a more complete picture of what went wrong when TestCRUDTableEntries fails. Log the state of each NetworkDB instance at various points in TestCRUDTableEntries to provide an even more complete picture. Increase the global logger verbosity in tests so warnings and debug logs are printed to the test log. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-06-13 17:08:17 -04:00
Cory Snider	dbb0d88109	libn/networkdb: use distinct type for own networks NetworkDB uses a muli-dimensional map of struct network to keep track of network attachments for both remote nodes and the local node. Only a subset of the struct fields are used for remote nodes' network attachments. The tableBroadcasts pointer field in particular is always initialized for network values representing local attachments (read: nDB.networks[nDB.config.NodeID]) and always nil for remote attachments. Consequently, unnecessary defensive nil-pointer checks are peppered throughout the code despite the aforementioned invariant. Enshrine the invariant that tableBroadcasts is initialized iff the network attachment is for the local node in the type system. Pare down struct network to only the fields needed for remote network attachments and move the local-only fields into a new struct thisNodeNetwork. Elide the unnecessary nil-checks. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-06-13 17:08:11 -04:00
Cory Snider	51f31826ee	libnetwork/networkdb: don't clear queue on rejoin When joining a network that was previously joined but not yet reaped, NetworkDB replaces the network struct value with a zeroed-out one with the entries count copied over. This is also the case when joining a network that is currently joined! Consequently, joining a network has the side effect of clearing the broadcast queue. If the queue is cleared while messages are still pending broadcast, convergence may be delayed until the next bulk sync cycle. Make it an error to join a network twice without leaving. Retain the existing broadcast queue when rejoining a network that has not yet been reaped. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-06-13 17:08:09 -04:00
Cory Snider	30b27ab6ea	libnetwork/networkdb: drop id field from network The map key for nDB.networks is the network ID. The struct field is not actually used anywhere in practice. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-06-13 16:25:19 -04:00
Matthieu MOREL	6d737371b8	fix comparison rule from errorlint Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com> Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2025-06-13 08:26:56 +00:00
Paweł Gronowski	c430c9c7f2	Merge pull request #50115 from corhere/libn/fix-47859-networkdb-clusterleave-leak libnetwork/networkdb: always shut down memberlist	2025-06-12 11:49:00 +00:00
Paweł Gronowski	5bbdb066d8	Merge pull request #50031 from vvoland/bbolt-1.4 vendor: go.etcd.io/bbolt v1.4.0	2025-06-12 10:09:09 +00:00
Sebastiaan van Stijn	13879e7b49	Merge pull request #50082 from mmorel-35/go-critic fix go-critic linter	2025-06-10 13:09:05 +02:00
Paweł Gronowski	52a8109a6b	Merge pull request #50155 from robmry/windows_no_mirrored_plugin Windows: don't try to load "mirrored" network plugin	2025-06-10 09:36:22 +00:00
Paweł Gronowski	2e25775c83	libnetwork: Replace deprecated usages Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>	2025-06-09 19:30:00 +02:00
Rob Murray	55f47f9e34	Windows: don't try to load "mirrored" network plugin Signed-off-by: Rob Murray <rob.murray@docker.com>	2025-06-09 11:38:13 +01:00
Matthieu MOREL	b8a4f6534f	fix stringsCompare and stringConcatSimplify from go-critic Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2025-06-07 09:57:59 +02:00
Matthieu MOREL	a62de57aa1	fix sprintfQuotedString from go-critic Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2025-06-07 09:57:59 +02:00
Matthieu MOREL	bc9ec5fc02	fix emptyStringTest from go-critic Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2025-06-07 09:57:59 +02:00
Matthieu MOREL	e5be7b54b1	fix yodaStyleExpr from go-critic Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2025-06-07 09:57:58 +02:00
Rob Murray	793dd8385a	Only "prune" Windows networks created by Docker Signed-off-by: Rob Murray <rob.murray@docker.com>	2025-06-06 20:24:04 +01:00
Rob Murray	9663b36b6d	Merge pull request #50054 from robmry/fix_port_mapping Avoid selecting duplicate host ports for mappings to 0.0.0.0 and specific addresses	2025-06-04 16:46:29 +01:00
Sebastiaan van Stijn	ee69d8ac95	Merge pull request #50051 from robmry/nftables_no_docker_user nftables: don't create DOCKER-USER iptables chains	2025-06-03 13:19:12 +02:00
Sebastiaan van Stijn	b6fa565cba	libnetwork/resolvconf: Build: decorate error for invalid nameservers Using the same prefix as is used in `Sandbox.loadResolvConf`, but omiting the value, as it's already part of the error message; `829b695375/libnetwork/sandbox_dns_unix.go (L258-L261)` Unfortunately, `netip.ParseAddr` returns a non-exported (`parseAddrError`) error-type; https://cs.opensource.google/go/go/+/refs/tags/go1.24.3:src/net/netip/netip.go;l=115 So we don't have the option to omit the `` from the error-message, and to take the underlying `msg` field; https://cs.opensource.google/go/go/+/refs/tags/go1.24.3:src/net/netip/netip.go;l=141-153 Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2025-06-02 12:33:28 +02:00
Sebastiaan van Stijn	35e062dde1	libnetwork/resolvconf: rewrite TestBuild tests to a table-test Also adding test-cases for; - empty options for all fields - invalid nameServer (domain instead of IP). Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2025-06-02 12:20:19 +02:00
Cory Snider	16ed51d864	libnetwork/networkdb: always shut down memberlist Gracefully leaving the memberlist cluster is a best-effort operation. Failing to successfully broadcast the leave message to a peer should not prevent NetworkDB from cleaning up the memberlist instance on close. But that was not the case in practice. Log the error returned from (*memberlist.Memberlist).Leave instead of returning it and proceed with shutting down irrespective of whether Leave() returns an error. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-05-30 13:56:47 -04:00
Sebastiaan van Stijn	fca97dae9d	libnet/d/overlay/overlayutils: prevent uint32 overflow CodeQL was complaining about the conversion to uint32 Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2025-05-30 14:00:14 +02:00
Cory Snider	df6b405796	libnetwork/d/overlay: drop initEncryption function The (driver).Join function does many things to set up overlay networking. One of the first things it does is call (network).joinSandbox, which in turn calls (driver).initSandboxPeerDB. The initSandboxPeerDB function iterates through the peer db to add entries to the VXLAN FDB, neighbor table and IPsec security association database in the kernel for all known peers on the overlay network. One of the last things the (driver).Join function does is call (*driver).initEncryption. The initEncryption function iterates through the peer db to add entries to the IPsec security association database in the kernel for all known peers on the overlay network. But the preceding initSandboxPeerDB call already did that! The initEncryption function is redundant and can safely be removed. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-05-29 14:13:13 -04:00
Cory Snider	713f887698	libnetwork/d/overlay: drop checkEncryption function In addition to being three functions in a trenchcoat, the checkEncryption function has a very subtle implementation which is difficult to reason about. That is not a good property for security relevant code to have. Replace two of the three calls to checkEncryption with conditional calls to setupEncryption and removeEncryption, lifting the conditional logic which was hidden away in checkEncryption into the call sites to make it easier to reason about the code. Replace the third call with a call to a new initEncryption function. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-05-29 14:13:13 -04:00
Cory Snider	cb4e7b2f03	libnetwork/d/overlay: make setupEncryption a method The setupEncryption and removeEncryption functions take several parameters, but all call sites pass the same values for all the parameters aside from remoteIP: values taken from fields of the driver struct. Refactor these functions to be methods of the driver struct and drop the redundant parameters. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-05-29 14:13:13 -04:00
Cory Snider	0d893252ac	libnetwork/d/overlay: checkEncryption: drop isLocal param Since it is not meaningful to add or remove encryption between the local node and itself, the isLocal parameter is redundant. Setting up encryption for all network peers is now invoked by calling checkEncryption(nid, netip.Addr{}, true) Calling checkEncryption with isLocal=true, add=false is now more explicitly a no-op. It always was effectively a no-op, but that was not easy to spot by inspection. In the world with the isLocal flag, calls to checkEncryption where isLocal=true and add=false would have rIP set to d.advertiseAddr. In other words, it was a request to remove encryption parameters between the local peer and itself if peerDB had no remote-peer entries for the network. So either the call would do nothing, or it would remove encryption parameters that aren't used for anything. Now the equivalent call always does nothing. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-05-29 14:13:13 -04:00
Cory Snider	4b1c1236b9	libnetwork/d/overlay: peerdb: drop isLocal param Drop the isLocal boolean parameters from the peerDB functions. Local peers have vtep == netip.Addr{}. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-05-29 14:13:13 -04:00
Cory Snider	48e0b24ff7	libnetwork/d/overlay: elide vtep for local peers The VTEP value for a peer in peerDB is only accurate for a remote peer. The VTEP for a local peer would be the driver's advertise address, which is not necessarily constant for the lifetime of the driver instance. The VTEP values persisted in the peerDB entries for local peers could be stale or missing if not kept in sync with the advertise address. And the peerDB could get polluted with duplicate entries for local peers if the advertise address was to change, as entries which differ only by VTEP are considered distinct by SetMatrix. Persisting the advertise address as the VTEP for local peers creates lots of problems that are not easy to solve. Stop persisting the VTEP for local peers in peerDB. Any code that needs to know the VTEP for local peers can look that up from the source of truth: the driver's advertise address. Use the lack of a VTEP in peerDB entries to signify local peers, making the isLocal flag redundant. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-05-29 14:13:13 -04:00
Cory Snider	a9e2d6d06e	libnetwork/d/overlay: filter local peers explicitly The overlay driver's checkEncryption function configures the IPSec parameters for the VXLAN tunnels to peer nodes. When called with isLocal=true, it configures encryption for all peer nodes with at least one peerDB entry. Since the local peers are also included in the peerDB, it needs to filter those entries out. It does so by filtering out any peer entries whose VTEP address is equal to the current local advertise address. Trouble is, the local advertise address is not necessarily constant. The driver tries to handle this case by calling peerDBUpdateSelf() when the advertise address changes. This function iterates through the peerDB and tries to update the VTEP address for all local peer entries, but it does not actually do anything: it mutates a temporary copy of the entry which is not persisted back into the peerDB. (It used to be functional, but was broken when the peerDB was extended to use SetMatrix.) So there may be cases where local peer entries are not filtered out properly, resulting in spurious encryption parameters being programmed into the kernel. Filter out local peers when walking the peerDB by filtering on whether the entry has the isLocal flag set. Remove the no-op code which attempts to update local entries in the peerDB. No other code takes any interest in the VTEP value for isLocal peer entries. Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-05-29 14:13:13 -04:00
Cory Snider	f144264bae	Merge pull request #50090 from corhere/libn/overlay-netip libnetwork/d/overlay: use netip types more	2025-05-29 14:12:28 -04:00
Rob Murray	a2652d4b81	Don't set up iptables chain DOCKER-USER when using nftables Signed-off-by: Rob Murray <rob.murray@docker.com>	2025-05-29 19:12:23 +01:00
Rob Murray	768cfaeb62	Merge pull request #50050 from robmry/nftables_internal_dns nftables: rules for the internal DNS resolver	2025-05-29 19:11:27 +01:00
Rob Murray	d3289dda4b	Add nftables NAT rules for internal DNS resolver Signed-off-by: Rob Murray <rob.murray@docker.com>	2025-05-29 17:20:25 +01:00
Rob Murray	b43afbf898	Merge pull request #50098 from robmry/remove_docker-user_return_rule iptables: Drop explicit RETURN rule from DOCKER-USER	2025-05-29 11:27:54 +01:00
Rob Murray	d6620915db	portallocator: always check for ports allocated for 0.0.0.0/:: We set SO_REUSEADDR on sockets used for host port mappings by docker-proxy - which means it's possible to bind the same port on a specific address as well as 0.0.0.0/::. For TCP sockets, an error is raised when listen() is called on both sockets - and the port allocator will be called again to avoid the clash (if the port was allocated from a range, otherwise the container will just fail to start). But, for UDP sockets, there's no listen() - so take more care to avoid the clash in the portallocator. The port allocator keeps a set of allocated ports for each of the host IP addresses it's seen, including 0.0.0.0/::. So, if a mapping to 0.0.0.0/:: is requested, find a port that's free in the range for each of the known IP addresses (but still only mark it as allocated against 0.0.0.0/::). And, if a port is requested for specific host addresses, make sure it's also free in the corresponding 0.0.0.0/:: set (but only mark it as allocated against the specific addresses - because the same port can be allocated against a different specific address). Signed-off-by: Rob Murray <rob.murray@docker.com>	2025-05-28 14:00:33 +01:00
Paweł Gronowski	0e2cc22d36	Merge pull request #50049 from robmry/nftables_env_var_enable nftables: enable using env var	2025-05-28 12:58:21 +00:00

1 2 3 4 5 ...

3990 Commits