InTheForest/moby

mirror of https://github.com/moby/moby.git synced 2026-01-11 18:51:37 +00:00

Author	SHA1	Message	Date
Cory Snider	bace1b8a3b	libnetwork/d/overlay: handle coalesced peer updates The eventually-consistent nature of NetworkDB means we cannot depend on events being received in the same order that they were sent. Nor can we depend on receiving events for all intermediate states. It is possible for a series of entry UPDATEs, or a DELETE followed by a CREATE with the same key, to get coalesced into a single UPDATE event on the receiving node. Watchers of NetworkDB tables therefore need to be prepared to gracefully handle arbitrary UPDATEs of a key, including those where the new value may have nothing in common with the previous value. The overlay driver naively handled events for overlay_peer_table assuming that an endpoint leave followed by a rejoin of the same endpoint would always be expressed as a DELETE event followed by a CREATE. It would handle a coalesced UPDATE as a CREATE, inserting a new entry into peerDB without removing the old one. This would have various side effects, such as having the "transient state" of multiple entries in peerDB with the same peer IP never settle. Update driverapi to pass both the previous and new value of a table entry into the driver. Modify the overlay driver to handle an UPDATE by removing the previous peer entry from peerDB then adding the new one. Modify the Windows overlay driver to match. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `e1a586a9a7`) libn/d/overlay: don't deref nil PeerRecord on error If unmarshaling the peer record fails, there is no need to check if it's a record for a local peer. Attempting to do so anyway will result in a nil-dereference panic. Don't do that. The Windows overlay driver has a typo: prevPeer is being checked twice for whether it was a local-peer record. Check prevPeer once and newPeer once each, as intended. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `12c6345d3a`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	f9e54290b5	libn/d/win/overlay: dedupe NetworkDB definitions Windows and Linux overlay driver instances are interoperable, working from the same NetworkDB table for peer discovery. As both drivers produce and consume serialized data through the table, they both need to have a shared understanding of the shape and semantics of that data. The Windows overlay driver contains a duplicate copy of the protobuf definitions used for marshaling and unmarshaling the NetworkDB peer entries for dubious reasons. It gives us the flexibility to have the definitions diverge, which is only really useful for shooting ourselves in the foot. Make libnetwork/drivers/overlay the source of truth for the peer record definitions and the name of the NetworkDB table for distributing peer records. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `8340e109de`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	fc3df55230	libn/d/overlay: extract hashable address types The macAddr and ipmac types are generally useful within libnetwork. Move them to a dedicated package and overhaul the API to be more like that of the net/netip package. Update the overlay driver to utilize these types, adapting to the new API. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `c7b93702b9`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	b22872af60	libnetwork/driverapi: make EventNotify optional Overlay is the only driver which makes use of the EventNotify facility, yet all other driver implementations are forced to provide a stub implementation. Move the EventNotify and DecodeTableEntry methods into a new optional TableWatcher interface and remove the stubs from all the other drivers. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `844023f794`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	d60c71a9d7	libnetwork/d/overlay: fix logical race conditions The concurrency control in the overlay driver is logically unsound. While the use of mutexes is sufficient to prevent data races -- violations of the Go memory model -- many operations which need to be atomic are performed with unbounded concurrency. Overhaul the use of locks in the overlay network driver. Implement sound locking at the network granularity: operations may proceed concurrently iff they are being applied to distinct networks. Push the responsibility of locking up to the code which calls methods or accesses struct fields to avoid deadlock situations like we had previously with d.initSandboxPeerDB() and to make the code easier to reason about. Each overlay network has a distinct peer db. The NetworkDB watch for the overlay peer table for the network will only start after (driver).CreateNetwork returns and will be stopped before libnetwork calls (driver).DeleteNetwork, therefore the lifetime of the peer db for a network is constrained to the lifetime of the network itself. Yet the peer db for a network is tracked in a dedicated map, separately from the network objects themselves. This has resulted in a parallel set of mutexes to manage concurrency of the peer db distinct from the mutexes for the driver and networks. Move the peer db for a network into a field of the network struct and guard it from concurrent access using the per-network lock. Move the methods for manipulating the peer db into the network struct so that the methods can only be called if the caller has a reference to the network object. Network creation and deletion are synchronized using the driver-scope mutex, but some of the kernel programming is performed outside of the critical section. It is possible for network deletion to race with recreating the network, interleaving the kernel programming for the network creation and deletion, resulting in inconsistent kernel state. Parallelize network creation and deletion soundly. Use a double-checked locking scheme to soundly handle the case of concurrent CreateNetwork and DeleteNetwork for the same network id without blocking operations on other networks. Synchronize operations on a network so that operations on the network such as adding a neighbor to the peer db are performed atomically, not interleaved with deleting the network. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `89d3419093`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	ad54b8f9ce	libn/d/overlay: fix encryption race conditions There is a dedicated mutex for synchronizing access to the encrMap. Separately, the main driver mutex is used for synchronizing access to the encryption keys. Their use is sufficient to prevent data races (if used correctly, which is not the case) but not logical race conditions. Programming the encryption parameters for a peer can race with encryption keys being updated, which could lead to inconsistencies between the parameters programmed into the kernel and the desired state. Introduce a new mutex for synchronizing encryption operations. Use that mutex to synchronize access to both encrMap and keys. Handle encryption key updates in a critical section so they can no longer be interleaved with kernel programming of encryption parameters. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `843cd96725`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	8075689abd	libn/d/overlay: inline secMapWalk into only caller func (driver) secMapWalk is a curious beast. It is named walk, yet it also mutates the collection being iterated over. It returns an error, but that error is always nil. It takes a callback that can break iteration, yet the only caller makes no use of that affordance. Its utility is limited and the abstraction hinders readability more than it helps. Open-code the d.secMap.nodes loop into func (driver) updateKeys(), the only caller. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `a1d299749c`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	480dfaef06	libnetwork/d/overlay: un-embed mutexes It is easier to find all references when they are struct fields rather than embedded structs. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `74713e1a7d`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	e604d70e22	libnetwork/d/overlay: ref-count encryption params The IPsec encryption parameters (Security Association Database and Security Policy Database entries) for a particular overlay network peer (VTEP) are shared global state as they have to be programmed into the root network namespace. The same parameters are used when encrypting VXLAN traffic to a particular VTEP for all overlay networks. Deleting the entries for a VTEP will break encryption to that VTEP across all encrypted overlay networks, therefore the decision of when to delete the entries must take the state of all overlay networks into account. Unfortunately this is not the case. The overlay driver uses local per-network state to decide when to program and delete the parameters for a VTEP. In practice, the parameters for all VTEPs participating in an encrypted overlay network are deleted when the network is deleted. Encryption to that VTEP over all other active encrypted overlay networks would be broken until some other incidental peerDB event triggered a re-programming of the parameters for that VTEP. Change the setupEncryption and removeEncryption functions to be reference-counted. The removeEncryption function needs to be called the same number of times as addEncryption before the parameters are deleted from the kernel. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `057e35dd65`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Sebastiaan van Stijn	b6b13b20af	libnetwork/drivers/overlay: fix naked returns, output variables libnetwork/drivers/overlay/encryption.go:370:2: naked return in func `programSA` with 64 lines of code (nakedret) return ^ Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit `02b4c7cc52`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	b539aea3cd	libnetwork/d/overlay: properly model peer db The overlay driver assumes that the peer table in NetworkDB will always converge to a 1:1:1 mapping from peer endpoint IP address to MAC address to VTEP. While this currently holds true in practice most of the time, it is not an invariant and there are ways that users can violate this assumption. The driver detects whether peer entries conflict with each other by matching up (IP, MAC) tuples. In the common case this works out fine as the MAC address for an endpoint is generally derived from the assigned IP address. If an IP address gets reassigned to a container on another node the MAC address will follow, so the driver's conflict resolution logic will behave as intended. However users may explicitly configure the MAC address for a container's network endpoints. If an IP address gets reassigned from a container with an auto-generated MAC address to a container with a manually-configured MAC, or vice versa, the driver would not detect the conflict as the (IP, MAC) tuples won't match up. It would attempt to program the kernel's neighbor table with two conflicting MAC addresses for one IP, which will fail. And since it does not realize that there is a conflict, the driver won't reprogram the kernel from the remaining entry when the other entry is deleted. The assumption that only one IP address may resolve to a given MAC address is violated if multiple IP addresses are assigned to an endpoint. This rarely comes up in practice today as the overlay driver only supports IPv4 single-stack connectivity for endpoints. If multiple distinct peer entries exist with the same MAC address, the driver will delete the MAC->VTEP mapping from the kernel's forwarding database when any entry is deleted, even if other entries remain active. This limitation is one of the biggest obstacles in the way of supporting IPv6 and dual-stack connectivity for endpoints attached to overlay networks. Modify the peer db logic to correctly handle the cases where peer entries have non-unique MAC or VTEP values. Treat any set of entries with non-unique IP addresses as a conflict, irrespective of the entries' MAC addresses. Maintain a reference count of forwarding database entries and only delete the MAC->VTEP mapping from the kernel when there are no longer any neighbor entries which resolve to that MAC. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `1c2b744ca2`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	e43e322a3b	libnetwork/d/overlay: refactor peer db impl The peer db implementation is more complex than it needs to be. Notably, the peerCRUD / peerCRUDOp function split is a vestige of its evolution from a worker goroutine receiving commands over a channel. Refactor the peer db operations to be easier to read, understand and modify. Factor the kernel-programming operations out into dedicated addNeighbor and deleteNeighbor functions. Inline the rest of the peerCRUDOp functions into their respective peerCRUD wrappers. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `59437f56f9`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	89ea2469df	libnetwork/d/overlay: drop initEncryption function The (driver).Join function does many things to set up overlay networking. One of the first things it does is call (network).joinSandbox, which in turn calls (driver).initSandboxPeerDB. The initSandboxPeerDB function iterates through the peer db to add entries to the VXLAN FDB, neighbor table and IPsec security association database in the kernel for all known peers on the overlay network. One of the last things the (driver).Join function does is call (*driver).initEncryption. The initEncryption function iterates through the peer db to add entries to the IPsec security association database in the kernel for all known peers on the overlay network. But the preceding initSandboxPeerDB call already did that! The initEncryption function is redundant and can safely be removed. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `df6b405796`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	f69e64ab12	libnetwork/d/overlay: drop checkEncryption function In addition to being three functions in a trenchcoat, the checkEncryption function has a very subtle implementation which is difficult to reason about. That is not a good property for security relevant code to have. Replace two of the three calls to checkEncryption with conditional calls to setupEncryption and removeEncryption, lifting the conditional logic which was hidden away in checkEncryption into the call sites to make it easier to reason about the code. Replace the third call with a call to a new initEncryption function. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `713f887698`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	67fbdf3c28	libnetwork/d/overlay: make setupEncryption a method The setupEncryption and removeEncryption functions take several parameters, but all call sites pass the same values for all the parameters aside from remoteIP: values taken from fields of the driver struct. Refactor these functions to be methods of the driver struct and drop the redundant parameters. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `cb4e7b2f03`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	33a7e83e6d	libnetwork/d/overlay: checkEncryption: drop isLocal param Since it is not meaningful to add or remove encryption between the local node and itself, the isLocal parameter is redundant. Setting up encryption for all network peers is now invoked by calling checkEncryption(nid, netip.Addr{}, true) Calling checkEncryption with isLocal=true, add=false is now more explicitly a no-op. It always was effectively a no-op, but that was not easy to spot by inspection. In the world with the isLocal flag, calls to checkEncryption where isLocal=true and add=false would have rIP set to d.advertiseAddr. In other words, it was a request to remove encryption parameters between the local peer and itself if peerDB had no remote-peer entries for the network. So either the call would do nothing, or it would remove encryption parameters that aren't used for anything. Now the equivalent call always does nothing. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `0d893252ac`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	684b2688d2	libnetwork/d/overlay: peerdb: drop isLocal param Drop the isLocal boolean parameters from the peerDB functions. Local peers have vtep == netip.Addr{}. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `4b1c1236b9`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	b61930cc82	libnetwork/d/overlay: elide vtep for local peers The VTEP value for a peer in peerDB is only accurate for a remote peer. The VTEP for a local peer would be the driver's advertise address, which is not necessarily constant for the lifetime of the driver instance. The VTEP values persisted in the peerDB entries for local peers could be stale or missing if not kept in sync with the advertise address. And the peerDB could get polluted with duplicate entries for local peers if the advertise address was to change, as entries which differ only by VTEP are considered distinct by SetMatrix. Persisting the advertise address as the VTEP for local peers creates lots of problems that are not easy to solve. Stop persisting the VTEP for local peers in peerDB. Any code that needs to know the VTEP for local peers can look that up from the source of truth: the driver's advertise address. Use the lack of a VTEP in peerDB entries to signify local peers, making the isLocal flag redundant. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `48e0b24ff7`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	1db0510301	libnetwork/d/overlay: filter local peers explicitly The overlay driver's checkEncryption function configures the IPSec parameters for the VXLAN tunnels to peer nodes. When called with isLocal=true, it configures encryption for all peer nodes with at least one peerDB entry. Since the local peers are also included in the peerDB, it needs to filter those entries out. It does so by filtering out any peer entries whose VTEP address is equal to the current local advertise address. Trouble is, the local advertise address is not necessarily constant. The driver tries to handle this case by calling peerDBUpdateSelf() when the advertise address changes. This function iterates through the peerDB and tries to update the VTEP address for all local peer entries, but it does not actually do anything: it mutates a temporary copy of the entry which is not persisted back into the peerDB. (It used to be functional, but was broken when the peerDB was extended to use SetMatrix.) So there may be cases where local peer entries are not filtered out properly, resulting in spurious encryption parameters being programmed into the kernel. Filter out local peers when walking the peerDB by filtering on whether the entry has the isLocal flag set. Remove the no-op code which attempts to update local entries in the peerDB. No other code takes any interest in the VTEP value for isLocal peer entries. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `a9e2d6d06e`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	9ff06c515c	libn/d/overlay: use netip types more The netip types are really useful for tracking state in the overlay driver as they are hashable, unlike net.IP and friends, making them directly useable as map keys. Converting between netip and net types is fairly trivial, but fewer conversions is more ergonomic. The NetworkDB entries for the overlay peer table encode the IP addresses as strings. We need to parse them to some representation before processing them further. Parse directly into netip types and pass those values around to cut down on the number of conversions needed. The peerDB needs to marshal the keys and entries to structs of hashable values to be able to insert them into the SetMatrix. Use netip.Addr in peerEntry so that peerEntry values can be directly inserted into the SetMatrix without conversions. Use a hashable struct type as the SetMatrix key to avoid having to marshal the whole struct to a string and parse it back out. Use netip.Addr as the map key for the driver's encryption map so the values do not need to be converted to and from strings. Change the encryption configuration methods to take netip types so the peerDB code can pass netip values directly. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `d188df0039`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	8f0a803fc6	libnetwork/internal/setmatrix: make keys generic Make the SetMatrix key's type generic so that e.g. netip.Addr values can be used as matrix keys. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `0317f773a6`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	7d8c7c21f2	libnetwork/osl: stop tracking neighbor entries The Namespace keeps some state for each inserted neighbor-table entry which is used to delete the entry (and any related entries) given only the IP and MAC address of the entry to delete. This state is not strictly required as the retained data is a pure function of the parameters passed to AddNeighbor(), and the kernel can inform us whether an attempt to add a neighbor entry would conflict with an existing entry. Get rid of the neighbor state in Namespace. It's just one more piece of state that can cause lots of grief if it falls out of sync with ground truth. Require callers to call DeleteNeighbor() with the same aguments as they had passed to AddNeighbor(). Push the responsibility for detecting attempts to insert conflicting entries into the neighbor table onto the kernel by using (*netlink.Handle).NeighAdd() instead of NeighSet(). Modernize the error messages and logging in DeleteNeighbor() and AddNeighbor(). Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `0d6e7cd983`) libn/d/overlay: delete FDB entry from AF_BRIDGE Starting with commit `0d6e7cd983` DeleteNeighbor() needs to be called with the same options as the AddNeighbor() call that created the neighbor entry. The calls in peerdb were modified incorrectly, resulting in the deletes failing and leaking neighbor entries. Fix up the DeleteNeighbor calls so that the FDB entry is deleted from the FDB instead of the neighbor table, and the neighbor is deleted from the neighbor table instead of the FDB. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `7a12bbe5d3`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	e5b652add3	libn/osl: drop unused AddNeighbor force parameter func (*Namespace) AddNeighbor is only ever called with the force parameter set to false. Remove the parameter and eliminate dead code. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `3bdf99d127`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:25 -04:00
Cory Snider	ca41647695	libn/d/overlay: drop miss flags from peerAddOp as all callers unconditionally set them to false. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `a8e8a4cdad`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:24 -04:00
Cory Snider	199b2496e7	libnetwork/d/overlay: drop miss flags from peerAdd as all callers unconditionally set them to false. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `6ee58c2d29`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:24 -04:00
Cory Snider	65ec8c89a6	libn/d/overlay: drop obsolete writeToStore comment The writeToStore() call was removed from CreateNetwork in commit `0fa873c0fe`. The comment about undoing the write is no longer applicable. Signed-off-by: Cory Snider <csnider@mirantis.com> (cherry picked from commit `d90277372f`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-08-11 15:13:24 -04:00
Cory Snider	59f062b233	Merge pull request #50511 from corhere/backport-25.0/libn/all-the-networkdb-fixes [25.0] libnetwork/networkdb: backport all the fixes	2025-08-07 11:44:08 -04:00
Rob Murray	651b2feb27	Restore INC iptables rules on firewalld reload Signed-off-by: Rob Murray <rob.murray@docker.com>	2025-07-28 12:06:58 -04:00
Matthieu MOREL	bacba3726f	fix redefines-builtin-id from revive Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>	2025-07-25 16:20:29 -04:00
Andrey Epifanov	fb6695de75	libnetwork: refactor iptable functions to include table parameter for improved rule management Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com> (cherry picked from commit `19a8083866`) Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>	2025-07-25 15:15:50 -04:00
Rob Murray	fbffa88b76	Restore legacy links along with other iptables rules On firewalld reload, all the iptables rules are deleted. Legacy links use iptables.OnReloaded to achieve that - but there's no way to deleted an OnRelaoded callback. So, a firewalld reload after the linked containers are deleted results in zombie rules being re-created. Legacy links are created by ProgramExternalConnectivity, but removed in Leave (rather than RevokeExternalConnectivity). So, restore legacy links for current endpoints, along with the other per-network/per-port rules. Move link-removal to RevokeExternalConnectivity, so that it happens with the configNetwork lock held. Signed-off-by: Rob Murray <rob.murray@docker.com>	2025-07-25 15:15:50 -04:00
Rob Murray	41f080df25	Restore iptables for current networks on firewalld reload Using iptables.OnReloaded to restore individual per-network rules on firewalld reload means rules for deleted networks pop back in to existence (because there was no way to delete the callbacks on network-delete). So, on firewalld reload, walk over current networks and ask them to restore their iptables rules. Signed-off-by: Rob Murray <rob.murray@docker.com> (cherry picked from commit `a527e5a546`) Test that firewalld reload doesn't re-create deleted iptables rules Signed-off-by: Rob Murray <rob.murray@docker.com> (cherry picked from commit `c3fa7c1779`) Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>	2025-07-25 15:15:50 -04:00
Matthieu MOREL	e53cf6bc02	fix(ST1016): Use consistent method receiver names Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com> Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit `70139978d3`) Signed-off-by: Cory Snider <csnider@mirantis.com>	2025-07-24 14:01:41 -04:00
Sebastiaan van Stijn	c231772a5c	update go:build tags to go1.23 to align with vendor.mod Go maintainers started to unconditionally update the minimum go version for golang.org/x/ dependencies to go1.23, which means that we'll no longer be able to support any version below that when updating those dependencies; > all: upgrade go directive to at least 1.23.0 [generated] > > By now Go 1.24.0 has been released, and Go 1.22 is no longer supported > per the Go Release Policy (https://go.dev/doc/devel/release#policy). > > For golang/go#69095. This updates our minimum version to go1.23, as we won't be able to maintain compatibility with older versions because of the above. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit `7c52c4d92e`) Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com> # Conflicts: # api/server/router/container/inspect.go # api/server/router/grpc/grpc.go # api/server/router/system/system.go # api/server/router/system/system_routes.go # api/types/registry/registry.go # api/types/registry/registry_test.go # builder/builder-next/adapters/containerimage/pull.go # container/view.go # daemon/container_operations.go # daemon/containerd/image_inspect.go # daemon/containerd/image_push_test.go # daemon/create.go # daemon/daemon.go # daemon/daemon_unix.go # daemon/info.go # daemon/inspect.go # daemon/logger/loggerutils/logfile.go # internal/gocompat/modulegenerator.go # internal/maputil/maputil.go # internal/platform/platform_linux.go # internal/sliceutil/sliceutil.go # libnetwork/config/config.go # libnetwork/drivers/bridge/port_mapping_linux.go # libnetwork/drivers/overlay/peerdb.go # libnetwork/endpoint.go # libnetwork/endpoint_store.go # libnetwork/internal/l2disco/unsol_arp_linux.go # libnetwork/internal/l2disco/unsol_na_linux.go # libnetwork/internal/nftables/nftables_linux.go # libnetwork/internal/resolvconf/resolvconf.go # libnetwork/internal/setmatrix/setmatrix.go # libnetwork/ipams/defaultipam/address_space.go # libnetwork/ipamutils/utils.go # libnetwork/iptables/iptables.go # libnetwork/netutils/utils_linux.go # libnetwork/network.go # libnetwork/network_store.go # libnetwork/networkdb/networkdb.go # libnetwork/options/options.go # libnetwork/osl/interface_linux.go # libnetwork/osl/route_linux.go # libnetwork/portallocator/portallocator.go # libnetwork/sandbox.go # libnetwork/service.go # oci/defaults.go # plugin/v2/plugin_linux.go # testutil/daemon/daemon.go # testutil/helpers.go	2025-05-13 08:50:07 -07:00
Sebastiaan van Stijn	eacc3610f9	libnetwork/drivers/bridge: setupIPChains: fix defer checking wrong err The output variable was renamed in `0503cf2510`, but that commit failed to change this defer, which was now checking the wrong error. Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit `01a55860c6`) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2024-12-16 16:54:55 +01:00
Sebastiaan van Stijn	b536253047	libnetwork/drivers/bridge: fix non-constant format string in call (govet) libnetwork/drivers/bridge/setup_ip_tables_linux.go:385:23: printf: non-constant format string in call to fmt.Errorf (govet) return fmt.Errorf(msg) ^ Signed-off-by: Sebastiaan van Stijn <github@gone.nl> (cherry picked from commit `068c1bf3be`) Signed-off-by: Austin Vazquez <macedonv@amazon.com>	2024-09-04 02:51:17 +00:00
Rob Murray	c761353e7c	Make 'internal' bridge networks accessible from host Prior to release 25.0.0, the bridge in an internal network was assigned an IP address - making the internal network accessible from the host, giving containers on the network access to anything listening on the bridge's address (or INADDR_ANY on the host). This change restores that behaviour. It does not restore the default route that was configured in the container, because packets sent outside the internal network's subnet have always been dropped. So, a 'connect()' to an address outside the subnet will still fail fast. Signed-off-by: Rob Murray <rob.murray@docker.com> (cherry picked from commit `419f5a6372`) Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2024-03-01 09:29:41 +01:00
Albin Kerouanton	7a659049b8	libnet: bridge: ignore EINVAL when configuring bridge MTU Since `964ab7158c`, we explicitly set the bridge MTU if it was specified. Unfortunately, kernel <v4.17 have a check preventing us to manually set the MTU to anything greater than 1500 if no links is attached to the bridge, which is how we do things -- create the bridge, set its MTU and later on, attach veths to it. Relevant kernel commit: `804b854d37` As we still have to support CentOS/RHEL 7 (and their old v3.10 kernels) for a few more months, we need to ignore EINVAL if the MTU is > 1500 (but <= 65535). Signed-off-by: Albin Kerouanton <albinker@gmail.com> (cherry picked from commit `89470a7114`) Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2024-02-02 19:51:11 +01:00
Rob Murray	990e95dcf0	Add internal n/w bridge to firewalld docker zone Containers attached to an 'internal' bridge network are unable to communicate when the host is running firewalld. Non-internal bridges are added to a trusted 'docker' firewalld zone, but internal bridges were not. DOCKER-ISOLATION iptables rules are still configured for an internal network, they block traffic to/from addresses outside the network's subnet. Signed-off-by: Rob Murray <rob.murray@docker.com> (cherry picked from commit `2cc627932a`) Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2024-02-02 08:38:44 +01:00
Albin Kerouanton	b9e27acabc	libnet/d/bridge: dead code: no conflict on stale default nw A check was added to the bridge driver to detect when it was called to create the default bridge nw whereas a stale default bridge already existed. In such case, the bridge driver was deleting the stale network before re-creating it. This check was introduced in docker/libnetwork@6b158eac6a to fix an issue related to newly introduced live-restore. However, since commit docker/docker@ecffb6d58c, the daemon doesn't even try to create default networks if there're active sandboxes (ie. due to live-restore). Thus, now it's impossible for the default bridge network to be stale and to exists when the driver's CreateNetwork() method is called. As such, the check introduced in the first commit mentioned above is dead code and can be safely removed. Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2024-01-04 11:50:04 +01:00
Albin Kerouanton	0a26cdf344	libnet/d/bridge: remove dead ActiveEndpointsError This error is unused since docker/libnetwork@6b158eac6. Signed-off-by: Albin Kerouanton <albinker@gmail.com>	2024-01-04 11:12:53 +01:00
Sebastiaan van Stijn	84ba2558e2	Merge pull request #46976 from robmry/bridge_todos Validate IPv6 address in libnetwork's bridge driver, remove unused error types.	2024-01-02 16:03:16 +01:00
Sebastiaan van Stijn	4f9db655ed	portmapper: move userland-proxy lookup to daemon config When mapping a port with the userland-proxy enabled, the daemon would perform an "exec.LookPath" for every mapped port (which, in case of a range of ports, would be for every port in the range). This was both inefficient (looking up the binary for each port), inconsistent (when running in rootless-mode, the binary was looked-up once), as well as inconvenient, because a missing binary, or a mis-configureed userland-proxy-path would not be detected daeemon startup, and not produce an error until starting the container; docker run -d -P nginx:alpine 4f7b6589a1680f883d98d03db12203973387f9061e7a963331776170e4414194 docker: Error response from daemon: driver failed programming external connectivity on endpoint romantic_wiles (7cfdc361821f75cbc665564cf49856cf216a5b09046d3c22d5b9988836ee088d): fork/exec docker-proxy: no such file or directory. However, the container would still be created (but invalid); docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 869f41d7e94f nginx:alpine "/docker-entrypoint.…" 10 seconds ago Created romantic_wiles This patch changes how the userland-proxy is configured; - The path of the userland-proxy is now looked up / configured at daemon startup; this is similar to how the proxy is configured in rootless-mode. - A warning is logged when failing to lookup the binary. - If the daemon is configured with "userland-proxy" enabled, an error is produced, and the daemon will refuse to start. - The "proxyPath" argument for newProxyCommand() (in libnetwork/portmapper) is now required to be set. It no longer looks up the executable, and produces an error if no path was provided. While this change was not required, it makes the daemon config the canonical source of truth, instead of logic spread accross multiplee locations. Some of this logic is a change of behavior, but these changes were made with the assumption that we don't want to support; - installing the userland proxy _after_ the daemon was started - moving the userland proxy (or installing a proxy with a higher preference in PATH) With this patch: Validating the config produces an error if the binary is not found: dockerd --validate WARN[2023-12-29T11:36:39.748699591Z] failed to lookup default userland-proxy binary error="exec: \"docker-proxy\": executable file not found in $PATH" userland-proxy is enabled, but userland-proxy-path is not set Disabling userland-proxy prints a warning, but validates as "OK": dockerd --userland-proxy=false --validate WARN[2023-12-29T11:38:30.752523879Z] ffailed to lookup default userland-proxy binary error="exec: \"docker-proxy\": executable file not found in $PATH" configuration OK Speficying a non-absolute path produces an error: dockerd --userland-proxy-path=docker-proxy --validate invalid userland-proxy-path: must be an absolute path: docker-proxy Befor this patch, we would not validate this path, which would allow the daemon to start, but fail to map a port; docker run -d -P nginx:alpine 4f7b6589a1680f883d98d03db12203973387f9061e7a963331776170e4414194 docker: Error response from daemon: driver failed programming external connectivity on endpoint romantic_wiles (7cfdc361821f75cbc665564cf49856cf216a5b09046d3c22d5b9988836ee088d): fork/exec docker-proxy: no such file or directory. Specifying an invalid userland-proxy-path produces an error as well: dockerd --userland-proxy-path=/usr/local/bin/no-such-binary --validate userland-proxy-path is invalid: stat /usr/local/bin/no-such-binary: no such file or directory mkdir -p /usr/local/bin/not-a-file dockerd --userland-proxy-path=/usr/local/bin/not-a-file --validate userland-proxy-path is invalid: exec: "/usr/local/bin/not-a-file": is a directory touch /usr/local/bin/not-an-executable dockerd --userland-proxy-path=/usr/local/bin/not-an-executable --validate userland-proxy-path is invalid: exec: "/usr/local/bin/not-an-executable": permission denied Same when using the daemon.json config-file; echo '{"userland-proxy-path":"no-such-binary"}' > /etc/docker/daemon.json dockerd --validate unable to configure the Docker daemon with file /etc/docker/daemon.json: merged configuration validation from file and command line flags failed: invalid userland-proxy-path: must be an absolute path: no-such-binary dockerd --userland-proxy-path=hello --validate unable to configure the Docker daemon with file /etc/docker/daemon.json: the following directives are specified both as a flag and in the configuration file: userland-proxy-path: (from flag: hello, from file: /usr/local/bin/docker-proxy) Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-12-29 16:23:18 +01:00
Rob Murray	141cb65e51	Check, then assume an IPv6 bridge has a subnet. If IPv6 is enabled for a bridge network, by the time configuration is applied, the bridge will always have an address. Assert that, by raising an error when the configuration is validated. Use that to simplify the logic used to calculate which addresses should be assigned to a bridge. Also remove a redundant check in setupGatewayIPv6() and the error associated with it. Fix unit tests that enabled IPv6, but didn't supply an IPv6 IPAM address/pool. Before this change, these tests passed but silently left the bridge without an IPv6 address. (The daemon already ensured there was an IPv6 address, this change does not add a new restriction on config at that level.) Signed-off-by: Rob Murray <rob.murray@docker.com>	2023-12-21 15:26:34 +00:00
Rob Murray	437bc829bf	Don't try to validate incomplete network config. Some checks in 'networkConfiguration.Validate()' were not running as expected, they'd always pass - because 'parseNetworkOptions()' called it before 'config.processIPAM()' had added IP addresses and gateways. Signed-off-by: Rob Murray <rob.murray@docker.com>	2023-12-21 15:16:26 +00:00
Rob Murray	52d9b0cb56	Remove unused error types. Signed-off-by: Rob Murray <rob.murray@docker.com>	2023-12-21 12:47:59 +00:00
Sebastiaan van Stijn	388216fc45	Merge pull request #46850 from robmry/46829-allow_ipv6_subnet_change Allow overlapping change in bridge's IPv6 network.	2023-12-19 18:35:13 +01:00
Rob Murray	27f3abd893	Allow overlapping change in bridge's IPv6 network. Calculate the IPv6 addreesses needed on a bridge, then reconcile them with the addresses on an existing bridge by deleting then adding as required. (Previously, required addresses were added one-by-one, then unwanted addresses were removed. This meant the daemon failed to start if, for example, an existing bridge had address '2000:db8::/64' and the config was changed to '2000:db8::/80'.) IPv6 addresses are now calculated and applied in one go, so there's no need for setupVerifyAndReconcile() to check the set of IPv6 addresses on the bridge. And, it was guarded by !config.InhibitIPv4, which can't have been right. So, removed its IPv6 parts, and added IPv4 to its name. Link local addresses, the example given in the original ticket, are now released when containers are stopped. Not releasing them meant that when using an LL subnet on the default bridge, no container could be started after a container was stopped (because the calculated address could not be re-allocated). In non-default bridge networks using an LL subnet, addresses leaked. Linux always uses the standard 'fe80::/64' LL network. So, if a bridge is configured with an LL subnet prefix that overlaps with it, a config error is reported. Non-overlapping LL subnet prefixes are allowed. Signed-off-by: Rob Murray <rob.murray@docker.com>	2023-12-18 16:10:41 +00:00
Sebastiaan van Stijn	2cf230951f	add //go:build directives to prevent downgrading to go1.16 language This repository is not yet a module (i.e., does not have a `go.mod`). This is not problematic when building the code in GOPATH or "vendor" mode, but when using the code as a module-dependency (in module-mode), different semantics are applied since Go1.21, which switches Go _language versions_ on a per-module, per-package, or even per-file base. A condensed summary of that logic [is as follows][1]: - For modules that have a go.mod containing a go version directive; that version is considered a minimum _required_ version (starting with the go1.19.13 and go1.20.8 patch releases: before those, it was only a recommendation). - For dependencies that don't have a go.mod (not a module), go language version go1.16 is assumed. - Likewise, for modules that have a go.mod, but the file does not have a go version directive, go language version go1.16 is assumed. - If a go.work file is present, but does not have a go version directive, language version go1.17 is assumed. When switching language versions, Go _downgrades_ the language version, which means that language features (such as generics, and `any`) are not available, and compilation fails. For example: # github.com/docker/cli/cli/context/store /go/pkg/mod/github.com/docker/cli@v25.0.0-beta.2+incompatible/cli/context/store/storeconfig.go:6:24: predeclared any requires go1.18 or later (-lang was set to go1.16; check go.mod) /go/pkg/mod/github.com/docker/cli@v25.0.0-beta.2+incompatible/cli/context/store/store.go:74:12: predeclared any requires go1.18 or later (-lang was set to go1.16; check go.mod) Note that these fallbacks are per-module, per-package, and can even be per-file, so _(indirect) dependencies_ can still use modern language features, as long as their respective go.mod has a version specified. Unfortunately, these failures do not occur when building locally (using vendor / GOPATH mode), but will affect consumers of the module. Obviously, this situation is not ideal, and the ultimate solution is to move to go modules (add a go.mod), but this comes with a non-insignificant risk in other areas (due to our complex dependency tree). We can revert to using go1.16 language features only, but this may be limiting, and may still be problematic when (e.g.) matching signatures of dependencies. There is an escape hatch: adding a `//go:build` directive to files that make use of go language features. From the [go toolchain docs][2]: > The go line for each module sets the language version the compiler enforces > when compiling packages in that module. The language version can be changed > on a per-file basis by using a build constraint. > > For example, a module containing code that uses the Go 1.21 language version > should have a `go.mod` file with a go line such as `go 1.21` or `go 1.21.3`. > If a specific source file should be compiled only when using a newer Go > toolchain, adding `//go:build go1.22` to that source file both ensures that > only Go 1.22 and newer toolchains will compile the file and also changes > the language version in that file to Go 1.22. This patch adds `//go:build` directives to those files using recent additions to the language. It's currently using go1.19 as version to match the version in our "vendor.mod", but we can consider being more permissive ("any" requires go1.18 or up), or more "optimistic" (force go1.21, which is the version we currently use to build). For completeness sake, note that any file _without_ a `//go:build` directive will continue to use go1.16 language version when used as a module. [1]: `58c28ba286/src/cmd/go/internal/gover/version.go (L9-L56)` [2]: https://go.dev/doc/toolchain Signed-off-by: Sebastiaan van Stijn <github@gone.nl>	2023-12-15 15:24:15 +01:00
Rob Murray	964ab7158c	Explicitly set MTU on bridge devices. This is purely cosmetic - if a non-default MTU is configured, the bridge will have the default MTU=1500 until a container's 'veth' is connected and an MTU is set on the veth. That's a disconcerting, it looks like the config has been ignored - so, set the bridge's MTU explicitly. Fixes #37937 Signed-off-by: Rob Murray <rob.murray@docker.com>	2023-11-27 11:18:54 +00:00

1 2 3 4 5 ...

912 Commits