Compare commits

...

88 Commits

Author SHA1 Message Date
Cory Snider
f415784c1a Merge pull request #51575 from smerkviladze/25.0-add-windows-integration-tests
[25.0 backport] integration: add Windows network driver and isolation tests
2025-11-26 18:55:24 -05:00
Sopho Merkviladze
4ef26e4c35 integration: add Windows network driver and isolation tests
Add integration tests for Windows container functionality focusing on network drivers and container isolation modes.

Signed-off-by: Sopho Merkviladze <smerkviladze@mirantis.com>
2025-11-26 13:24:13 +04:00
Sebastiaan van Stijn
2b409606ac Merge pull request #51344 from smerkviladze/50179-25.0-windows-gha-updates
[25.0 backport] gha: update to windows 2022 / 2025
2025-10-30 22:20:37 +01:00
Sebastiaan van Stijn
00fbff3423 integration/networking: increase context timeout for attach
The TestBridgeICCWindows test was failing on Windows due to a context timeout:

=== FAIL: github.com/docker/docker/integration/networking TestBridgeICCWindows/User_defined_nat_network (9.02s)
    bridge_test.go:243: assertion failed: error is not nil: Post "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.44/containers/62a4ed964f125e023cc298fde2d4d2f8f35415da970fd163b24e181b8c0c6654/start": context deadline exceeded
    panic.go:635: assertion failed: error is not nil: Error response from daemon: error while removing network: network mynat id 25066355c070294c1d8d596c204aa81f056cc32b3e12bf7c56ca9c5746a85b0c has active endpoints

=== FAIL: github.com/docker/docker/integration/networking TestBridgeICCWindows (17.65s)

Windows appears to be slower to start, so these timeouts are expected.
Increase the context timeout to give it a little more time.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 0ea28fede0)
Signed-off-by: Sopho Merkviladze <smerkviladze@mirantis.com>
2025-10-30 21:58:10 +04:00
Sebastiaan van Stijn
92df858a5b integration-cli: TestCopyFromContainerPathIsNotDir: adjust for win 2025
It looks like the error returned by Windows changed in Windows 2025; before
Windows 2025, this produced a `ERROR_INVALID_NAME`;

    The filename, directory name, or volume label syntax is incorrect.

But Windows 2025 produces a `ERROR_DIRECTORY` ("The directory name is invalid."):

    CreateFile \\\\?\\Volume{d9f06b05-0405-418b-b3e5-4fede64f3cdc}\\windows\\system32\\drivers\\etc\\hosts\\: The directory name is invalid.

Docs; https://learn.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit d3d20b9195)
Signed-off-by: Sopho Merkviladze <smerkviladze@mirantis.com>
2025-10-30 14:19:54 +04:00
Sebastiaan van Stijn
00f9f839c6 gha: run windows 2025 on PRs, 2022 scheduled
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 9316396db0)
Signed-off-by: Sopho Merkviladze <smerkviladze@mirantis.com>
2025-10-30 13:37:45 +04:00
Sebastiaan van Stijn
acd2546285 gha: update to windows 2022 / 2025
The hosted Windows 2019 runners reach EOL on June 30;
https://github.com/actions/runner-images/issues/12045

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 6f484d0d4c)
Signed-off-by: Sopho Merkviladze <smerkviladze@mirantis.com>
2025-10-30 13:07:40 +04:00
Sebastiaan van Stijn
d334795adb Merge pull request #51129 from smerkviladze/25.0-bump-swarmkit-to-v2.1.1
[25.0 backport] vendor: github.com/moby/swarmkit/v2 v2.1.1
2025-10-09 20:51:21 +02:00
Sopho Merkviladze
71967c3a82 vendor: github.com/moby/swarmkit/v2 v2.1.1
- Afford NetworkAllocator passing per-app state in Control API responses
- manager: restore NewServer to v2 signature
- api: afford passing params to OnGetNetwork hook
- Remove weak TLS cipher suites

full diff: https://github.com/moby/swarmkit/compare/v2.0.0...v2.1.1

Signed-off-by: Sopho Merkviladze <smerkviladze@mirantis.com>
(cherry picked from commit ca9c5c6f7b)
Signed-off-by: Sopho Merkviladze <smerkviladze@mirantis.com>
2025-10-09 21:52:39 +04:00
Sebastiaan van Stijn
f06fd6d3c9 Merge pull request #51117 from austinvazquez/cherry-pick-fix-go-validation-checks-to-25.0
[25.0] Rework Go mod tidy/vendor checks
2025-10-07 23:09:20 +02:00
Sebastiaan van Stijn
ce61e5777b Merge pull request #51084 from smerkviladze/25.0-enable-strong-ciphers
[25.0] daemon: add support for DOCKER_DISABLE_WEAK_CIPHERS env-var to enforce strong TLS ciphers
2025-10-07 20:49:07 +02:00
Sopho Merkviladze
26d6c35b1b daemon: optionally enforce strong TLS ciphers via environment variable
Introduce the DOCKER_DISABLE_WEAK_CIPHERS environment variable to allow
disabling weak TLS ciphers. When set to true, the daemon restricts
TLS to a modern, secure subset of cipher suites, disabling known weak
ciphers such as CBC-mode ciphers.

This is intended as an edge-case option and is not exposed via a CLI flag or
config option. By default, weak ciphers remain enabled for backward compatibility.

Signed-off-by: Sopho Merkviladze <smerkviladze@mirantis.com>
2025-10-07 21:24:07 +04:00
Sebastiaan van Stijn
a14b16e1f3 Merge pull request #51123 from crazy-max/25.0_ci-cache-fixes
[25.0] ci: update gha cache attributes
2025-10-07 19:19:53 +02:00
CrazyMax
3ea40f50ef ci: update gha cache attributes
Signed-off-by: CrazyMax <1951866+crazy-max@users.noreply.github.com>
2025-10-07 12:42:46 +02:00
Austin Vazquez
7c47f6d831 Rework Go mod tidy/vendor checks
This change reworks the Go mod tidy/vendor checks to run for all tracked Go modules by the project and fail for any uncommitted changes.

Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
(cherry picked from commit f6e1bf2808)
Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
2025-10-06 19:32:59 -05:00
Austin Vazquez
0847330073 Add existence check for go.mod and go.sum files
Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
(cherry picked from commit 0ad35e3ef0)
Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
2025-10-06 19:25:50 -05:00
Paweł Gronowski
b4c0ebf6d4 Merge pull request #50939 from vvoland/50936-25.0
[25.0 backport] Dockerfile.windows: remove deprecated 7Zip4Powershell
2025-09-09 19:35:21 +02:00
Paweł Gronowski
00f6814357 Dockerfile.windows: remove deprecated 7Zip4Powershell
`tar` utility is included in Windows 10 (17063+) and Windows Server
2019+ so we can use it directly.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
(cherry picked from commit 8c8324b37f)
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2025-09-09 17:35:49 +02:00
Cory Snider
165516eb47 Merge pull request #50551 from corhere/backport-25.0/libn/all-the-overlay-fixes
[25.0] libnetwork/overlay: backport all the fixes
2025-08-11 16:52:52 -04:00
Cory Snider
f099e911bd libnetwork: handle coalesced endpoint events
The eventually-consistent nature of NetworkDB means we cannot depend on
events being received in the same order that they were sent. Nor can we
depend on receiving events for all intermediate states. It is possible
for a series of entry UPDATEs, or a DELETE followed by a CREATE with the
same key, to get coalesced into a single UPDATE event on the receiving
node. Watchers of NetworkDB tables therefore need to be prepared to
gracefully handle arbitrary UPDATEs of a key, including those where the
new value may have nothing in common with the previous value.

The libnetwork controller naively handled events for endpoint_table
assuming that an endpoint leave followed by a rejoin of the same
endpoint would always be expressed as a DELETE event followed by a
CREATE. It would handle a coalesced UPDATE as a CREATE, adding a new
service binding without removing the old one. This would
have various side effects, such as having the "transient state" of
having multiple conflicting service bindings where more than one
endpoint is assigned an IP address never settling.

Modify the libnetwork controller to handle an UPDATE by removing the
previous service binding then adding the new one.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 4538a1de0a)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
bace1b8a3b libnetwork/d/overlay: handle coalesced peer updates
The eventually-consistent nature of NetworkDB means we cannot depend on
events being received in the same order that they were sent. Nor can we
depend on receiving events for all intermediate states. It is possible
for a series of entry UPDATEs, or a DELETE followed by a CREATE with the
same key, to get coalesced into a single UPDATE event on the receiving
node. Watchers of NetworkDB tables therefore need to be prepared to
gracefully handle arbitrary UPDATEs of a key, including those where the
new value may have nothing in common with the previous value.

The overlay driver naively handled events for overlay_peer_table
assuming that an endpoint leave followed by a rejoin of the same
endpoint would always be expressed as a DELETE event followed by a
CREATE. It would handle a coalesced UPDATE as a CREATE, inserting a new
entry into peerDB without removing the old one. This would
have various side effects, such as having the "transient state" of
multiple entries in peerDB with the same peer IP never settle.

Update driverapi to pass both the previous and new value of a table
entry into the driver. Modify the overlay driver to handle an UPDATE by
removing the previous peer entry from peerDB then adding the new one.
Modify the Windows overlay driver to match.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit e1a586a9a7)

libn/d/overlay: don't deref nil PeerRecord on error

If unmarshaling the peer record fails, there is no need to check if it's
a record for a local peer. Attempting to do so anyway will result in a
nil-dereference panic. Don't do that.

The Windows overlay driver has a typo: prevPeer is being checked twice
for whether it was a local-peer record. Check prevPeer once and newPeer
once each, as intended.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 12c6345d3a)

Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
f9e54290b5 libn/d/win/overlay: dedupe NetworkDB definitions
Windows and Linux overlay driver instances are interoperable, working
from the same NetworkDB table for peer discovery. As both drivers
produce and consume serialized data through the table, they both need to
have a shared understanding of the shape and semantics of that data.
The Windows overlay driver contains a duplicate copy of the protobuf
definitions used for marshaling and unmarshaling the NetworkDB peer
entries for dubious reasons. It gives us the flexibility to have the
definitions diverge, which is only really useful for shooting ourselves
in the foot.

Make libnetwork/drivers/overlay the source of truth for the peer record
definitions and the name of the NetworkDB table for distributing peer
records.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 8340e109de)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
fc3df55230 libn/d/overlay: extract hashable address types
The macAddr and ipmac types are generally useful within libnetwork. Move
them to a dedicated package and overhaul the API to be more like that of
the net/netip package.

Update the overlay driver to utilize these types, adapting to the new
API.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit c7b93702b9)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
b22872af60 libnetwork/driverapi: make EventNotify optional
Overlay is the only driver which makes use of the EventNotify facility,
yet all other driver implementations are forced to provide a stub
implementation. Move the EventNotify and DecodeTableEntry methods into a
new optional TableWatcher interface and remove the stubs from all the
other drivers.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 844023f794)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
c7e17ae65d libn/networkdb: report prev value in update events
When handling updates to existing entries, it is often necessary to know
what the previous value was. NetworkDB knows the previous and new values
when it broadcasts an update event for an entry. Include both values in
the update event so the watchers do not have to do their own parallel
bookkeeping.

Unify the event types under WatchEvent as representing the operation kind
in the type system has been inconvenient, not useful. The operation is
now implied by the nilness of the Value and Prev event fields.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 69c3c56eba)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
d60c71a9d7 libnetwork/d/overlay: fix logical race conditions
The concurrency control in the overlay driver is logically unsound.
While the use of mutexes is sufficient to prevent data races --
violations of the Go memory model -- many operations which need to be
atomic are performed with unbounded concurrency.

Overhaul the use of locks in the overlay network driver. Implement sound
locking at the network granularity: operations may proceed concurrently
iff they are being applied to distinct networks. Push the responsibility
of locking up to the code which calls methods or accesses struct fields
to avoid deadlock situations like we had previously with
d.initSandboxPeerDB() and to make the code easier to reason about.

Each overlay network has a distinct peer db. The NetworkDB watch for the
overlay peer table for the network will only start after
(*driver).CreateNetwork returns and will be stopped before libnetwork
calls (*driver).DeleteNetwork, therefore the lifetime of the peer db for
a network is constrained to the lifetime of the network itself. Yet the
peer db for a network is tracked in a dedicated map, separately from the
network objects themselves. This has resulted in a parallel set of
mutexes to manage concurrency of the peer db distinct from the mutexes
for the driver and networks. Move the peer db for a network into a field
of the network struct and guard it from concurrent access using the
per-network lock. Move the methods for manipulating the peer db into the
network struct so that the methods can only be called if the caller has
a reference to the network object.

Network creation and deletion are synchronized using the driver-scope
mutex, but some of the kernel programming is performed outside of the
critical section. It is possible for network deletion to race with
recreating the network, interleaving the kernel programming for the
network creation and deletion, resulting in inconsistent kernel state.
Parallelize network creation and deletion soundly. Use a double-checked
locking scheme to soundly handle the case of concurrent CreateNetwork
and DeleteNetwork for the same network id without blocking operations
on other networks. Synchronize operations on a network so that
operations on the network such as adding a neighbor to the peer db are
performed atomically, not interleaved with deleting the network.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 89d3419093)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
ad54b8f9ce libn/d/overlay: fix encryption race conditions
There is a dedicated mutex for synchronizing access to the encrMap.
Separately, the main driver mutex is used for synchronizing access to
the encryption keys. Their use is sufficient to prevent data races (if
used correctly, which is not the case) but not logical race conditions.
Programming the encryption parameters for a peer can race with
encryption keys being updated, which could lead to inconsistencies
between the parameters programmed into the kernel and the desired state.

Introduce a new mutex for synchronizing encryption operations. Use that
mutex to synchronize access to both encrMap and keys. Handle encryption
key updates in a critical section so they can no longer be interleaved
with kernel programming of encryption parameters.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 843cd96725)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
8075689abd libn/d/overlay: inline secMapWalk into only caller
func (*driver) secMapWalk is a curious beast. It is named walk, yet it
also mutates the collection being iterated over. It returns an error,
but that error is always nil. It takes a callback that can break
iteration, yet the only caller makes no use of that affordance. Its
utility is limited and the abstraction hinders readability more than it
helps. Open-code the d.secMap.nodes loop into
func (*driver) updateKeys(), the only caller.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit a1d299749c)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
480dfaef06 libnetwork/d/overlay: un-embed mutexes
It is easier to find all references when they are struct fields rather
than embedded structs.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 74713e1a7d)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
e604d70e22 libnetwork/d/overlay: ref-count encryption params
The IPsec encryption parameters (Security Association Database and
Security Policy Database entries) for a particular overlay network peer
(VTEP) are shared global state as they have to be programmed into the
root network namespace. The same parameters are used when encrypting
VXLAN traffic to a particular VTEP for all overlay networks. Deleting
the entries for a VTEP will break encryption to that VTEP across all
encrypted overlay networks, therefore the decision of when to delete the
entries must take the state of all overlay networks into account.
Unfortunately this is not the case.

The overlay driver uses local per-network state to decide when to
program and delete the parameters for a VTEP. In practice, the
parameters for all VTEPs participating in an encrypted overlay network
are deleted when the network is deleted. Encryption to that VTEP over
all other active encrypted overlay networks would be broken until some
other incidental peerDB event triggered a re-programming of the
parameters for that VTEP.

Change the setupEncryption and removeEncryption functions to be
reference-counted. The removeEncryption function needs to be called the
same number of times as addEncryption before the parameters are deleted
from the kernel.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 057e35dd65)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Sebastiaan van Stijn
b6b13b20af libnetwork/drivers/overlay: fix naked returns, output variables
libnetwork/drivers/overlay/encryption.go:370:2: naked return in func `programSA` with 64 lines of code (nakedret)
        return
        ^

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 02b4c7cc52)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
b539aea3cd libnetwork/d/overlay: properly model peer db
The overlay driver assumes that the peer table in NetworkDB will always
converge to a 1:1:1 mapping from peer endpoint IP address to MAC address
to VTEP. While this currently holds true in practice most of the time,
it is not an invariant and there are ways that users can violate this
assumption.

The driver detects whether peer entries conflict with each other by
matching up (IP, MAC) tuples. In the common case this works out fine as
the MAC address for an endpoint is generally derived from the assigned
IP address. If an IP address gets reassigned to a container on another
node the MAC address will follow, so the driver's conflict resolution
logic will behave as intended. However users may explicitly configure
the MAC address for a container's network endpoints. If an IP address
gets reassigned from a container with an auto-generated MAC address to a
container with a manually-configured MAC, or vice versa, the driver
would not detect the conflict as the (IP, MAC) tuples won't match up. It
would attempt to program the kernel's neighbor table with two
conflicting MAC addresses for one IP, which will fail. And since it
does not realize that there is a conflict, the driver won't reprogram
the kernel from the remaining entry when the other entry is deleted.

The assumption that only one IP address may resolve to a given MAC
address is violated if multiple IP addresses are assigned to an
endpoint. This rarely comes up in practice today as the overlay driver
only supports IPv4 single-stack connectivity for endpoints. If multiple
distinct peer entries exist with the same MAC address, the driver will
delete the MAC->VTEP mapping from the kernel's forwarding database when
any entry is deleted, even if other entries remain active. This
limitation is one of the biggest obstacles in the way of supporting IPv6
and dual-stack connectivity for endpoints attached to overlay networks.

Modify the peer db logic to correctly handle the cases where peer
entries have non-unique MAC or VTEP values. Treat any set of entries
with non-unique IP addresses as a conflict, irrespective of the entries'
MAC addresses. Maintain a reference count of forwarding database entries
and only delete the MAC->VTEP mapping from the kernel when there are no
longer any neighbor entries which resolve to that MAC.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 1c2b744ca2)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
e43e322a3b libnetwork/d/overlay: refactor peer db impl
The peer db implementation is more complex than it needs to be.
Notably, the peerCRUD / peerCRUDOp function split is a vestige of its
evolution from a worker goroutine receiving commands over a channel.

Refactor the peer db operations to be easier to read, understand and
modify. Factor the kernel-programming operations out into dedicated
addNeighbor and deleteNeighbor functions. Inline the rest of the
peerCRUDOp functions into their respective peerCRUD wrappers.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 59437f56f9)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
89ea2469df libnetwork/d/overlay: drop initEncryption function
The (*driver).Join function does many things to set up overlay
networking. One of the first things it does is call
(*network).joinSandbox, which in turn calls (*driver).initSandboxPeerDB.
The initSandboxPeerDB function iterates through the peer db to add
entries to the VXLAN FDB, neighbor table and IPsec security association
database in the kernel for all known peers on the overlay network.

One of the last things the (*driver).Join function does is call
(*driver).initEncryption. The initEncryption function iterates through
the peer db to add entries to the IPsec security association database in
the kernel for all known peers on the overlay network. But the preceding
initSandboxPeerDB call already did that! The initEncryption function is
redundant and can safely be removed.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit df6b405796)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
f69e64ab12 libnetwork/d/overlay: drop checkEncryption function
In addition to being three functions in a trenchcoat, the
checkEncryption function has a very subtle implementation which is
difficult to reason about. That is not a good property for security
relevant code to have.

Replace two of the three calls to checkEncryption with conditional calls
to setupEncryption and removeEncryption, lifting the conditional logic
which was hidden away in checkEncryption into the call sites to make it
easier to reason about the code. Replace the third call with a call to a
new initEncryption function.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 713f887698)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
67fbdf3c28 libnetwork/d/overlay: make setupEncryption a method
The setupEncryption and removeEncryption functions take several
parameters, but all call sites pass the same values for all the
parameters aside from remoteIP: values taken from fields of the driver
struct. Refactor these functions to be methods of the driver struct and
drop the redundant parameters.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit cb4e7b2f03)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
33a7e83e6d libnetwork/d/overlay: checkEncryption: drop isLocal param
Since it is not meaningful to add or remove encryption between the local
node and itself, the isLocal parameter is redundant. Setting up
encryption for all network peers is now invoked by calling

    checkEncryption(nid, netip.Addr{}, true)

Calling checkEncryption with isLocal=true, add=false is now more
explicitly a no-op. It always was effectively a no-op, but that was not
easy to spot by inspection. In the world with the isLocal flag,
calls to checkEncryption where isLocal=true and add=false would have rIP
set to d.advertiseAddr. In other words, it was a request to remove
encryption parameters between the local peer and itself if peerDB had no
remote-peer entries for the network. So either the call would do
nothing, or it would remove encryption parameters that aren't used for
anything. Now the equivalent call always does nothing.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 0d893252ac)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
684b2688d2 libnetwork/d/overlay: peerdb: drop isLocal param
Drop the isLocal boolean parameters from the peerDB functions. Local
peers have vtep == netip.Addr{}.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 4b1c1236b9)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
b61930cc82 libnetwork/d/overlay: elide vtep for local peers
The VTEP value for a peer in peerDB is only accurate for a remote peer.
The VTEP for a local peer would be the driver's advertise address, which
is not necessarily constant for the lifetime of the driver instance.
The VTEP values persisted in the peerDB entries for local peers could be
stale or missing if not kept in sync with the advertise address. And the
peerDB could get polluted with duplicate entries for local peers if the
advertise address was to change, as entries which differ only by VTEP
are considered distinct by SetMatrix. Persisting the advertise address
as the VTEP for local peers creates lots of problems that are not easy
to solve.

Stop persisting the VTEP for local peers in peerDB. Any code that needs
to know the VTEP for local peers can look that up from the source of
truth: the driver's advertise address. Use the lack of a VTEP in peerDB
entries to signify local peers, making the isLocal flag redundant.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 48e0b24ff7)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
1db0510301 libnetwork/d/overlay: filter local peers explicitly
The overlay driver's checkEncryption function configures the IPSec
parameters for the VXLAN tunnels to peer nodes. When called with
isLocal=true, it configures encryption for all peer nodes with at least
one peerDB entry. Since the local peers are also included in the peerDB,
it needs to filter those entries out. It does so by filtering out any
peer entries whose VTEP address is equal to the current local advertise
address. Trouble is, the local advertise address is not necessarily
constant. The driver tries to handle this case by calling
peerDBUpdateSelf() when the advertise address changes. This function
iterates through the peerDB and tries to update the VTEP address for all
local peer entries, but it does not actually do anything: it mutates a
temporary copy of the entry which is not persisted back into the peerDB.
(It used to be functional, but was broken when the peerDB was extended
to use SetMatrix.) So there may be cases where local peer entries are
not filtered out properly, resulting in spurious encryption parameters
being programmed into the kernel.

Filter out local peers when walking the peerDB by filtering on whether
the entry has the isLocal flag set. Remove the no-op code which attempts
to update local entries in the peerDB. No other code takes any interest
in the VTEP value for isLocal peer entries.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit a9e2d6d06e)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
9ff06c515c libn/d/overlay: use netip types more
The netip types are really useful for tracking state in the overlay
driver as they are hashable, unlike net.IP and friends, making them
directly useable as map keys. Converting between netip and net types is
fairly trivial, but fewer conversions is more ergonomic.

The NetworkDB entries for the overlay peer table encode the IP addresses
as strings. We need to parse them to some representation before
processing them further. Parse directly into netip types and pass those
values around to cut down on the number of conversions needed.

The peerDB needs to marshal the keys and entries to structs of hashable
values to be able to insert them into the SetMatrix. Use netip.Addr in
peerEntry so that peerEntry values can be directly inserted into the
SetMatrix without conversions. Use a hashable struct type as the
SetMatrix key to avoid having to marshal the whole struct to a string
and parse it back out.

Use netip.Addr as the map key for the driver's encryption map so the
values do not need to be converted to and from strings. Change the
encryption configuration methods to take netip types so the peerDB code
can pass netip values directly.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit d188df0039)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
8f0a803fc6 libnetwork/internal/setmatrix: make keys generic
Make the SetMatrix key's type generic so that e.g. netip.Addr values can
be used as matrix keys.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 0317f773a6)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
7d8c7c21f2 libnetwork/osl: stop tracking neighbor entries
The Namespace keeps some state for each inserted neighbor-table entry
which is used to delete the entry (and any related entries) given only
the IP and MAC address of the entry to delete. This state is not
strictly required as the retained data is a pure function of the
parameters passed to AddNeighbor(), and the kernel can inform us whether
an attempt to add a neighbor entry would conflict with an existing
entry. Get rid of the neighbor state in Namespace. It's just one more
piece of state that can cause lots of grief if it falls out of sync with
ground truth. Require callers to call DeleteNeighbor() with the same
aguments as they had passed to AddNeighbor(). Push the responsibility
for detecting attempts to insert conflicting entries into the neighbor
table onto the kernel by using (*netlink.Handle).NeighAdd() instead of
NeighSet().

Modernize the error messages and logging in DeleteNeighbor() and
AddNeighbor().

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 0d6e7cd983)

libn/d/overlay: delete FDB entry from AF_BRIDGE

Starting with commit 0d6e7cd983
DeleteNeighbor() needs to be called with the same options as the
AddNeighbor() call that created the neighbor entry. The calls in peerdb
were modified incorrectly, resulting in the deletes failing and leaking
neighbor entries. Fix up the DeleteNeighbor calls so that the FDB entry
is deleted from the FDB instead of the neighbor table, and the neighbor
is deleted from the neighbor table instead of the FDB.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 7a12bbe5d3)

Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
9cd4021dae libnetwork/osl: remove superfluous locks in Namespace
The isDefault and nlHandle fields are immutable once the Namespace is
constructed.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 9866738736)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
4d6c4e44d7 libn/osl: refactor func (*Namespace) AddNeighbor
Scope local variables as narrowly as possible.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit b6d76eb572)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
e5b652add3 libn/osl: drop unused AddNeighbor force parameter
func (*Namespace) AddNeighbor is only ever called with the force
parameter set to false. Remove the parameter and eliminate dead code.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 3bdf99d127)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:25 -04:00
Cory Snider
ca41647695 libn/d/overlay: drop miss flags from peerAddOp
as all callers unconditionally set them to false.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit a8e8a4cdad)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:24 -04:00
Cory Snider
199b2496e7 libnetwork/d/overlay: drop miss flags from peerAdd
as all callers unconditionally set them to false.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 6ee58c2d29)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:24 -04:00
Cory Snider
65ec8c89a6 libn/d/overlay: drop obsolete writeToStore comment
The writeToStore() call was removed from CreateNetwork in
commit 0fa873c0fe. The comment about
undoing the write is no longer applicable.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit d90277372f)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-08-11 15:13:24 -04:00
Austin Vazquez
c447682dee Merge pull request #50693 from corhere/backport-25.0/fix-frozen
[25.0 backport] Fix download-frozen-image-v2
2025-08-11 12:11:09 -07:00
Paweł Gronowski
a749f055d9 download-frozen-image-v2: Use curl -L
Passing the Auth to the redirected location was fixed in curl 7.58:
https://curl.se/changes.html#7_58_0 so we no longer need the extra
handling and can just use `-L` to let curl handle redirects.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2025-08-11 13:40:18 -04:00
Paweł Gronowski
5a12eaf718 download-frozen-image-v2: handle 307 responses without decimal
Correctly parse HTTP response that doesn't contain an HTTP version with a decimal place:

```
< HTTP/2 307
```

The previous version would only match strings like `HTTP/2.0 307`.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2025-08-11 13:40:18 -04:00
Cory Snider
59f062b233 Merge pull request #50511 from corhere/backport-25.0/libn/all-the-networkdb-fixes
[25.0] libnetwork/networkdb: backport all the fixes
2025-08-07 11:44:08 -04:00
Paweł Gronowski
842a9c522a Merge commit from fork
[25.0] Restore INC rules on firewalld reload
2025-07-29 10:01:01 +00:00
Rob Murray
651b2feb27 Restore INC iptables rules on firewalld reload
Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-07-28 12:06:58 -04:00
Cory Snider
a43c1eef18 Merge pull request #50445 from corhere/backport-25.0/fix-firewalld-reload
[25.0 backport] libnetwork/d/{bridge,overlay}: fix firewalld reload handling
2025-07-28 12:05:05 -04:00
Cory Snider
728de37428 libnetwork/networkdb: improve quality of randomness
The property test for the mRandomNodes function revealed that it may
sometimes pick out a sample of fewer than m nodes even when the number
of nodes to pick from (excluding the local node) is >= m. Rewrite it
using a random shuffle or permutation so that it always picks a
uniformly-distributed sample of the requested size whenever the
population is large enough.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit ac5f464649)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:34 -04:00
Cory Snider
5bf90ded7a libnetwork/networkdb: test quality of mRandomNodes
TestNetworkDBAlwaysConverges will occasionally find a failure where one
entry is missing on one node even after waiting a full five minutes. One
possible explanation is that the selection of nodes to gossip with is
biased in some way. Test that the mRandomNodes function picks a
uniformly distributed sample of node IDs of sufficient length.

The new test reveals that mRandomNodes may sometimes pick out a sample
of fewer than m nodes even when the number of nodes to pick from
(excluding the local node) is >= m. Put the test behind an xfail tag so
it is opt-in to run, without interfering with CI or bisecting.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 5799deb853)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:34 -04:00
Cory Snider
51d13163c5 libnetwork/networkdb: add convergence test
Add a property-based test which asserts that a cluster of NetworkDB
nodes always eventually converges to a consistent state. As this test
takes a long time to run it is build-tagged to be excluded from CI.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit d8730dc1d3)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:34 -04:00
Cory Snider
9ca52f5fb9 libnetwork/networkdb: log encryption keys to file
Add a feature to NetworkDB to log the encryption keys to a file for the
Wireshark memberlist plugin to consume, configured using an environment
variable.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit ebfafa1561)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:33 -04:00
Cory Snider
ec820662de libn/networkdb: stop forging tombstone entries
When a node leaves a network, all entries owned by that node are
implicitly deleted. The other NetworkDB nodes handle the leave by
setting the deleted flag on the entries owned by the left node in their
local stores. This behaviour is problematic as it results in two
conflicting entries with the same Lamport timestamp propagating
through the cluster.

Consider two NetworkDB nodes, A, and B, which are both joined to some
network. Node A in quick succession leaves the network, immediately
rejoins it, then creates an entry. If Node B processes the
entry-creation event first, it will add the entry to its local store
then set the deleted flag upon processing the network-leave. No matter
how many times B bulk-syncs with A, B will ignore the live entry for
having the same timestamp as its local tombstone entry. Once this
situation occurs, the only way to recover is for the entry to get
updated by A with a new timestamp.

There is no need for a node to store forged tombstones for another
node's entries. All nodes will purge the entries naturally when they
process the network-leave or node-leave event. Simply delete the
non-owned entries from the local store so there is no inconsistent state
to interfere with convergence when nodes rejoin a network. Have nodes
update their local store with tombstones for entries when leaving a
network so that after a rapid leave-then-rejoin the entry deletions
propagate to nodes which may have missed the leave event.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 21d9109750)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:33 -04:00
Cory Snider
f3f1e091a8 libnetwork/networkdb: fix broadcast queue deadlocks
NetworkDB's JoinNetwork function enqueues a message onto a
TransmitLimitedQueue while holding the NetworkDB mutex locked for
writing. The TransmitLimitedQueue has its own synchronization;
it locks its mutex when enqueueing a message. Locking order:
  1. (NetworkDB).RWMutex.Lock()
  2. (TransmitLimitedQueue).mu.Lock()

NetworkDB's gossip periodic task calls GetBroadcasts on the same
TransmitLimitedQueue to retrieve the enqueued messages. GetBroadcasts
invokes the queue's NumNodes callback while the mutex is locked. The
NumNodes callback function that NetworkDB sets locks the NetworkDB mutex
for reading to take the length of the nodes map. Locking order:
  1. (TransmitLimitedQueue).mu.Lock()
  2. (NetworkDB).RWMutex.RLock()

If one goroutine calls GetBroadcasts on the queue concurrently with
another goroutine calling JoinNetwork on the NetworkDB, the goroutines
may deadlock due to the lock inversion.

Fix the deadlock by caching the number of nodes in an atomic variable so
that the NumNodes callback can load the value without blocking or
violating Go's memory model. And fix a similar deadlock situation with
the table-event broadcast queues.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 08bde5edfa)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:33 -04:00
Cory Snider
16dc168388 libn/networkdb: make TestNetworkDBIslands not flaky
With rejoinClusterBootStrap fixed in tests, split clusters should
reliably self-heal in tests as well as production. Work around the other
source of flakiness in TestNetworkDBIslands: timing out waiting for a
failed node to transition to gracefully left. This flake happens when
one of the leaving nodes sends its NodeLeft message to the other leaving
node, and the second is shut down before it has a chance to rebroadcast
the message to the remaining nodes. The proper fix would be to leverage
memberlist's own bookkeeping instead of duplicating it poorly with user
messages, but doing so requires a change in the memberlist module.
Instead have the test check that the sum of failed+left nodes is
expected instead of waiting for all nodes to have failed==3 && left==0.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit aff444df86)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:33 -04:00
Cory Snider
12aaf29287 libn/networkdb: prevent spurious rejoins in tests
The rejoinClusterBootStrap periodic task rejoins with the bootstrap
nodes if none of them are members of the cluster. It correlates the
cluster nodes with the bootstrap list by comparing IP addresses,
ignoring ports. In normal operation this works out fine as every node
has a unique IP address, but in unit tests every node listens on a
distinct port of 127.0.0.1. This situation causes the check to
incorrectly filter out all nodes from the list, mistaking them for the
local node.

Filter out the local node using pointer equality of the *node to avoid
any ambiguity. Correlate the remote nodes by IP:port so that the check
behaves the same in tests and in production.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 1e1be54d3e)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:33 -04:00
Cory Snider
ca5250dc9f libnetwork/networkdb: prioritize local broadcasts
A network node is responsible for both broadcasting table events for
entries it owns and for rebroadcasting table events from other nodes it
has received. Table events to be broadcast are added to a single queue
per network, including events for rebroadcasting. As the memberlist
TransmitLimitedQueue is (to a first approximation) LIFO, a flood of
events from other nodes could delay the broadcasting of
locally-generated events indefinitely. Prioritize broadcasting local
events by splitting up the queues and only pulling from the rebroadcast
queue if there is free space in the gossip packet after draining the
local-broadcast queue.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 6ec6e0991a)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:33 -04:00
Cory Snider
c912e5278b libnetwork/networkdb: improve TestCRUDTableEntries
Log more details when assertions fail to provide a more complete picture
of what went wrong when TestCRUDTableEntries fails. Log the state of
each NetworkDB instance at various points in TestCRUDTableEntries to
provide an even more complete picture.

Increase the global logger verbosity in tests so warnings and debug logs
are printed to the test log.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit e9a7154909)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:33 -04:00
Cory Snider
e9ed499888 libn/networkdb: use distinct type for own networks
NetworkDB uses a muli-dimensional map of struct network to keep track of
network attachments for both remote nodes and the local node. Only a
subset of the struct fields are used for remote nodes' network
attachments. The tableBroadcasts pointer field in particular is
always initialized for network values representing local attachments
(read: nDB.networks[nDB.config.NodeID]) and always nil for remote
attachments. Consequently, unnecessary defensive nil-pointer checks are
peppered throughout the code despite the aforementioned invariant.

Enshrine the invariant that tableBroadcasts is initialized iff the
network attachment is for the local node in the type system. Pare down
struct network to only the fields needed for remote network attachments
and move the local-only fields into a new struct thisNodeNetwork. Elide
the unnecessary nil-checks.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit dbb0d88109)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:33 -04:00
Cory Snider
6856a17655 libnetwork/networkdb: don't clear queue on rejoin
When joining a network that was previously joined but not yet reaped,
NetworkDB replaces the network struct value with a zeroed-out one with
the entries count copied over. This is also the case when joining a
network that is currently joined! Consequently, joining a network has
the side effect of clearing the broadcast queue. If the queue is cleared
while messages are still pending broadcast, convergence may be delayed
until the next bulk sync cycle.

Make it an error to join a network twice without leaving. Retain the
existing broadcast queue when rejoining a network that has not yet been
reaped.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 51f31826ee)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:33 -04:00
Cory Snider
31f4c5914e libnetwork/networkdb: drop id field from network
The map key for nDB.networks is the network ID. The struct field is not
actually used anywhere in practice.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 30b27ab6ea)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:33 -04:00
Cory Snider
35f7b1d7c9 libn/networkdb: take most tests off flaky list
The loopback-test fixes seem to be sufficient to resolve the flakiness
of all the tests aside from TestFlakyNetworkDBIslands.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 697c17ca95)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:33 -04:00
Cory Snider
743a0df9ec libnetwork/networkdb: always shut down memberlist
Gracefully leaving the memberlist cluster is a best-effort operation.
Failing to successfully broadcast the leave message to a peer should not
prevent NetworkDB from cleaning up the memberlist instance on close. But
that was not the case in practice. Log the error returned from
(*memberlist.Memberlist).Leave instead of returning it and proceed with
shutting down irrespective of whether Leave() returns an error.

Signed-off-by: Cory Snider <csnider@mirantis.com>
(cherry picked from commit 16ed51d864)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 16:20:33 -04:00
Matthieu MOREL
bacba3726f fix redefines-builtin-id from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-07-25 16:20:29 -04:00
Andrey Epifanov
f93d90cee3 overlay: Reload Ingress iptables rules in swarm mode
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
(cherry picked from commit a1f68bf5a6)
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
2025-07-25 15:17:16 -04:00
Andrey Epifanov
00232ac981 libnetwork: split programIngress() and dependent functions on Add and Del functions
- refactor programIngressPorts to use Rule.Insert/Append/Delete for improved rule management
- split programIngress() and dependent functions on Add and Del functions

Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
(cherry picked from commit 8b208f1b95)
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
2025-07-25 15:17:16 -04:00
Andrey Epifanov
88d0ed889d libnetwork: refactor ingress chain management for improved rule handling and initialization
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
(cherry picked from commit 50e6f4c4cb)
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
2025-07-25 15:17:16 -04:00
Andrey Epifanov
32c814a85f libnetwork: add FlushChain methods for improved iptables management
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
(cherry picked from commit 4f0485e45f)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 15:17:16 -04:00
Andrey Epifanov
fb8e5d85f6 libnetwork: refactor rule management to use Ensure method for Append and Insert operations
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
(cherry picked from commit 262c32565b)
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
2025-07-25 15:15:50 -04:00
Andrey Epifanov
fb6695de75 libnetwork: refactor iptable functions to include table parameter for improved rule management
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
(cherry picked from commit 19a8083866)
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
2025-07-25 15:15:50 -04:00
Andrey Epifanov
089d70f3c8 libnetwork: extract plumpIngressProxy steps in a separate function
- Extract plumpIngressProxy steps in a separate function
- Don't create a new listener if there's already one in ingressProxyTbl

Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
(cherry picked from commit c2e2e7fe24)
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
2025-07-25 15:15:50 -04:00
Andrey Epifanov
2710c239df libnetwork: extract programIngressPorts steps in a separate functions
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
(cherry picked from commit 51ed289b06)
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
2025-07-25 15:15:50 -04:00
Andrey Epifanov
7982904677 libnetwork: extract creation/initiation of INGRESS-DOCKER chains in separate function
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
(cherry picked from commit 752758ae77)
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
2025-07-25 15:15:50 -04:00
Rob Murray
fbffa88b76 Restore legacy links along with other iptables rules
On firewalld reload, all the iptables rules are deleted. Legacy
links use iptables.OnReloaded to achieve that - but there's no
way to deleted an OnRelaoded callback. So, a firewalld reload
after the linked containers are deleted results in zombie rules
being re-created.

Legacy links are created by ProgramExternalConnectivity, but
removed in Leave (rather than RevokeExternalConnectivity).

So, restore legacy links for current endpoints, along with the
other per-network/per-port rules.

Move link-removal to RevokeExternalConnectivity, so that it
happens with the configNetwork lock held.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-07-25 15:15:50 -04:00
Rob Murray
41f080df25 Restore iptables for current networks on firewalld reload
Using iptables.OnReloaded to restore individual per-network rules
on firewalld reload means rules for deleted networks pop back in
to existence (because there was no way to delete the callbacks on
network-delete).

So, on firewalld reload, walk over current networks and ask them
to restore their iptables rules.

Signed-off-by: Rob Murray <rob.murray@docker.com>
(cherry picked from commit a527e5a546)

Test that firewalld reload doesn't re-create deleted iptables rules

Signed-off-by: Rob Murray <rob.murray@docker.com>
(cherry picked from commit c3fa7c1779)

Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
2025-07-25 15:15:50 -04:00
Rob Murray
c64e8a8117 Add test util "WithPortMap"
Backport the WithPortMap() function through a partial cherry-pick.

Signed-off-by: Rob Murray <rob.murray@docker.com>
(cherry picked from commit 20c99e4156)
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
2025-07-25 15:15:50 -04:00
Albin Kerouanton
0316eaaa23 Install and run firewalld for CI's firewalld tests
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
(cherry picked from commit 8883db20c5)
Signed-off-by: Rob Murray <rob.murray@docker.com>
(cherry picked from commit adfed82ab8)
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
2025-07-25 15:15:50 -04:00
Rob Murray
270166cbe5 Fix TestPassthrough
Doesn't look like it would ever have worked, but:
- init the dbus connection to avoid a segv
- include the chain name when creating the rule
- remove the test rule if it's created

Signed-off-by: Rob Murray <rob.murray@docker.com>
(cherry picked from commit 0ab6f07c31)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-25 15:15:50 -04:00
Rob Murray
a012739c2c Add test util "FirewalldRunning"
Signed-off-by: Rob Murray <rob.murray@docker.com>
(cherry picked from commit b8cacdf324)
Signed-off-by: Andrey Epifanov <aepifanov@mirantis.com>
2025-07-25 14:46:24 -04:00
Matthieu MOREL
e53cf6bc02 fix(ST1016): Use consistent method receiver names
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
(cherry picked from commit 70139978d3)
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-07-24 14:01:41 -04:00
199 changed files with 14313 additions and 2818 deletions

View File

@@ -41,6 +41,25 @@ jobs:
-
name: Set up runner
uses: ./.github/actions/setup-runner
-
name: Prepare
run: |
CACHE_DEV_SCOPE=dev
if [[ "${{ matrix.mode }}" == *"rootless"* ]]; then
# In rootless mode, tests will run in the host's namspace not the rootlesskit
# namespace. So, probably no different to non-rootless unit tests and can be
# removed from the test matrix.
echo "DOCKER_ROOTLESS=1" >> $GITHUB_ENV
fi
if [[ "${{ matrix.mode }}" == *"firewalld"* ]]; then
echo "FIREWALLD=true" >> $GITHUB_ENV
CACHE_DEV_SCOPE="${CACHE_DEV_SCOPE}firewalld"
fi
if [[ "${{ matrix.mode }}" == *"systemd"* ]]; then
echo "SYSTEMD=true" >> $GITHUB_ENV
CACHE_DEV_SCOPE="${CACHE_DEV_SCOPE}systemd"
fi
echo "CACHE_DEV_SCOPE=${CACHE_DEV_SCOPE}" >> $GITHUB_ENV
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
@@ -204,6 +223,7 @@ jobs:
- ""
- rootless
- systemd
- firewalld
#- rootless-systemd FIXME: https://github.com/moby/moby/issues/44084
exclude:
- os: ubuntu-24.04 # FIXME: https://github.com/moby/moby/pull/49579#issuecomment-2698622223
@@ -229,6 +249,10 @@ jobs:
echo "SYSTEMD=true" >> $GITHUB_ENV
CACHE_DEV_SCOPE="${CACHE_DEV_SCOPE}systemd"
fi
if [[ "${{ matrix.mode }}" == *"firewalld"* ]]; then
echo "FIREWALLD=true" >> $GITHUB_ENV
CACHE_DEV_SCOPE="${CACHE_DEV_SCOPE}firewalld"
fi
echo "CACHE_DEV_SCOPE=${CACHE_DEV_SCOPE}" >> $GITHUB_ENV
-
name: Set up Docker Buildx
@@ -372,6 +396,15 @@ jobs:
-
name: Set up tracing
uses: ./.github/actions/setup-tracing
-
name: Prepare
run: |
CACHE_DEV_SCOPE=dev
if [[ "${{ matrix.mode }}" == *"firewalld"* ]]; then
echo "FIREWALLD=true" >> $GITHUB_ENV
CACHE_DEV_SCOPE="${CACHE_DEV_SCOPE}firewalld"
fi
echo "CACHE_DEV_SCOPE=${CACHE_DEV_SCOPE}" >> $GITHUB_ENV
-
name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3

View File

@@ -32,8 +32,8 @@ env:
GOTESTLIST_VERSION: v0.3.1
TESTSTAT_VERSION: v0.1.25
WINDOWS_BASE_IMAGE: mcr.microsoft.com/windows/servercore
WINDOWS_BASE_TAG_2019: ltsc2019
WINDOWS_BASE_TAG_2022: ltsc2022
WINDOWS_BASE_TAG_2025: ltsc2025
TEST_IMAGE_NAME: moby:test
TEST_CTN_NAME: moby
DOCKER_BUILDKIT: 0
@@ -65,8 +65,8 @@ jobs:
run: |
New-Item -ItemType "directory" -Path "${{ github.workspace }}\go-build"
New-Item -ItemType "directory" -Path "${{ github.workspace }}\go\pkg\mod"
If ("${{ inputs.os }}" -eq "windows-2019") {
echo "WINDOWS_BASE_IMAGE_TAG=${{ env.WINDOWS_BASE_TAG_2019 }}" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf-8 -Append
If ("${{ inputs.os }}" -eq "windows-2025") {
echo "WINDOWS_BASE_IMAGE_TAG=${{ env.WINDOWS_BASE_TAG_2025 }}" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf-8 -Append
} ElseIf ("${{ inputs.os }}" -eq "windows-2022") {
echo "WINDOWS_BASE_IMAGE_TAG=${{ env.WINDOWS_BASE_TAG_2022 }}" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf-8 -Append
}
@@ -145,8 +145,8 @@ jobs:
New-Item -ItemType "directory" -Path "${{ github.workspace }}\go-build"
New-Item -ItemType "directory" -Path "${{ github.workspace }}\go\pkg\mod"
New-Item -ItemType "directory" -Path "bundles"
If ("${{ inputs.os }}" -eq "windows-2019") {
echo "WINDOWS_BASE_IMAGE_TAG=${{ env.WINDOWS_BASE_TAG_2019 }}" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf-8 -Append
If ("${{ inputs.os }}" -eq "windows-2025") {
echo "WINDOWS_BASE_IMAGE_TAG=${{ env.WINDOWS_BASE_TAG_2025 }}" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf-8 -Append
} ElseIf ("${{ inputs.os }}" -eq "windows-2022") {
echo "WINDOWS_BASE_IMAGE_TAG=${{ env.WINDOWS_BASE_TAG_2022 }}" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf-8 -Append
}
@@ -319,8 +319,8 @@ jobs:
name: Init
run: |
New-Item -ItemType "directory" -Path "bundles"
If ("${{ inputs.os }}" -eq "windows-2019") {
echo "WINDOWS_BASE_IMAGE_TAG=${{ env.WINDOWS_BASE_TAG_2019 }}" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf-8 -Append
If ("${{ inputs.os }}" -eq "windows-2025") {
echo "WINDOWS_BASE_IMAGE_TAG=${{ env.WINDOWS_BASE_TAG_2025 }}" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf-8 -Append
} ElseIf ("${{ inputs.os }}" -eq "windows-2022") {
echo "WINDOWS_BASE_IMAGE_TAG=${{ env.WINDOWS_BASE_TAG_2022 }}" | Out-File -FilePath $Env:GITHUB_ENV -Encoding utf-8 -Append
}

View File

@@ -86,7 +86,7 @@ jobs:
targets: dev
set: |
*.cache-from=type=gha,scope=dev-arm64
*.cache-to=type=gha,scope=dev-arm64,mode=max
*.cache-to=type=gha,scope=dev-arm64
*.output=type=cacheonly
test-unit:

View File

@@ -41,6 +41,7 @@ jobs:
mode:
- ""
- systemd
- firewalld
steps:
-
name: Prepare
@@ -58,7 +59,7 @@ jobs:
targets: dev
set: |
*.cache-from=type=gha,scope=dev${{ matrix.mode }}
*.cache-to=type=gha,scope=dev${{ matrix.mode }},mode=max
*.cache-to=type=gha,scope=dev${{ matrix.mode }}
*.output=type=cacheonly
test:

View File

@@ -14,12 +14,9 @@ concurrency:
cancel-in-progress: true
on:
schedule:
- cron: '0 10 * * *'
workflow_dispatch:
push:
branches:
- 'master'
- '[0-9]+.[0-9]+'
pull_request:
jobs:
validate-dco:

View File

@@ -1,4 +1,4 @@
name: windows-2019
name: windows-2025
# Default to 'contents: read', which grants actions to read commits.
#
@@ -14,9 +14,13 @@ concurrency:
cancel-in-progress: true
on:
schedule:
- cron: '0 10 * * *'
workflow_dispatch:
push:
branches:
- 'master'
- '[0-9]+.[0-9]+'
- '[0-9]+.x'
pull_request:
jobs:
validate-dco:
@@ -37,6 +41,6 @@ jobs:
matrix:
storage: ${{ fromJson(needs.test-prepare.outputs.matrix) }}
with:
os: windows-2019
os: windows-2025
storage: ${{ matrix.storage }}
send_coverage: false

View File

@@ -61,6 +61,7 @@ linters-settings:
# FIXME make sure all packages have a description. Currently, there's many packages without.
- name: package-comments
disabled: true
- name: redefines-builtin-id
issues:
# The default exclusion rules are a bit too permissive, so copying the relevant ones below
exclude-use-default: false

View File

@@ -16,6 +16,7 @@ ARG BUILDX_VERSION=0.12.1
ARG COMPOSE_VERSION=v2.24.5
ARG SYSTEMD="false"
ARG FIREWALLD="false"
ARG DOCKER_STATIC=1
# REGISTRY_VERSION specifies the version of the registry to download from
@@ -500,7 +501,16 @@ RUN --mount=type=cache,sharing=locked,id=moby-dev-aptlib,target=/var/lib/apt \
systemd-sysv
ENTRYPOINT ["hack/dind-systemd"]
FROM dev-systemd-${SYSTEMD} AS dev-base
FROM dev-systemd-${SYSTEMD} AS dev-firewalld-false
FROM dev-systemd-true AS dev-firewalld-true
RUN --mount=type=cache,sharing=locked,id=moby-dev-aptlib,target=/var/lib/apt \
--mount=type=cache,sharing=locked,id=moby-dev-aptcache,target=/var/cache/apt \
apt-get update && apt-get install -y --no-install-recommends \
firewalld
RUN sed -i 's/FirewallBackend=nftables/FirewallBackend=iptables/' /etc/firewalld/firewalld.conf
FROM dev-firewalld-${FIREWALLD} AS dev-base
RUN groupadd -r docker
RUN useradd --create-home --gid docker unprivilegeduser \
&& mkdir -p /home/unprivilegeduser/.local/share/docker \

View File

@@ -255,14 +255,11 @@ RUN `
Remove-Item C:\gitsetup.zip; `
`
Write-Host INFO: Downloading containerd; `
Install-Package -Force 7Zip4PowerShell; `
$location='https://github.com/containerd/containerd/releases/download/'+$Env:CONTAINERD_VERSION+'/containerd-'+$Env:CONTAINERD_VERSION.TrimStart('v')+'-windows-amd64.tar.gz'; `
Download-File $location C:\containerd.tar.gz; `
New-Item -Path C:\containerd -ItemType Directory; `
Expand-7Zip C:\containerd.tar.gz C:\; `
Expand-7Zip C:\containerd.tar C:\containerd; `
tar -xzf C:\containerd.tar.gz -C C:\containerd; `
Remove-Item C:\containerd.tar.gz; `
Remove-Item C:\containerd.tar; `
`
# Ensure all directories exist that we will require below....
$srcDir = """$Env:GOPATH`\src\github.com\docker\docker\bundles"""; `

View File

@@ -56,6 +56,7 @@ DOCKER_ENVS := \
-e DOCKER_USERLANDPROXY \
-e DOCKERD_ARGS \
-e DELVE_PORT \
-e FIREWALLD \
-e GITHUB_ACTIONS \
-e TEST_FORCE_VALIDATE \
-e TEST_INTEGRATION_DIR \
@@ -149,6 +150,9 @@ DOCKER_BUILD_ARGS += --build-arg=DOCKERCLI_INTEGRATION_REPOSITORY
ifdef DOCKER_SYSTEMD
DOCKER_BUILD_ARGS += --build-arg=SYSTEMD=true
endif
ifdef FIREWALLD
DOCKER_BUILD_ARGS += --build-arg=FIREWALLD=true
endif
BUILD_OPTS := ${DOCKER_BUILD_ARGS} ${DOCKER_BUILD_OPTS}
BUILD_CMD := $(BUILDX) build

View File

@@ -25,15 +25,15 @@ func NewRouter(b Backend, d experimentalProvider) router.Router {
}
// Routes returns the available routers to the build controller
func (r *buildRouter) Routes() []router.Route {
return r.routes
func (br *buildRouter) Routes() []router.Route {
return br.routes
}
func (r *buildRouter) initRoutes() {
r.routes = []router.Route{
router.NewPostRoute("/build", r.postBuild),
router.NewPostRoute("/build/prune", r.postPrune),
router.NewPostRoute("/build/cancel", r.postCancel),
func (br *buildRouter) initRoutes() {
br.routes = []router.Route{
router.NewPostRoute("/build", br.postBuild),
router.NewPostRoute("/build/prune", br.postPrune),
router.NewPostRoute("/build/cancel", br.postCancel),
}
}

View File

@@ -23,14 +23,14 @@ func NewRouter(b Backend, decoder httputils.ContainerDecoder) router.Router {
}
// Routes returns the available routers to the checkpoint controller
func (r *checkpointRouter) Routes() []router.Route {
return r.routes
func (cr *checkpointRouter) Routes() []router.Route {
return cr.routes
}
func (r *checkpointRouter) initRoutes() {
r.routes = []router.Route{
router.NewGetRoute("/containers/{name:.*}/checkpoints", r.getContainerCheckpoints, router.Experimental),
router.NewPostRoute("/containers/{name:.*}/checkpoints", r.postContainerCheckpoint, router.Experimental),
router.NewDeleteRoute("/containers/{name}/checkpoints/{checkpoint}", r.deleteContainerCheckpoint, router.Experimental),
func (cr *checkpointRouter) initRoutes() {
cr.routes = []router.Route{
router.NewGetRoute("/containers/{name:.*}/checkpoints", cr.getContainerCheckpoints, router.Experimental),
router.NewPostRoute("/containers/{name:.*}/checkpoints", cr.postContainerCheckpoint, router.Experimental),
router.NewDeleteRoute("/containers/{name}/checkpoints/{checkpoint}", cr.deleteContainerCheckpoint, router.Experimental),
}
}

View File

@@ -8,7 +8,7 @@ import (
"github.com/docker/docker/api/types/checkpoint"
)
func (s *checkpointRouter) postContainerCheckpoint(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
func (cr *checkpointRouter) postContainerCheckpoint(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
if err := httputils.ParseForm(r); err != nil {
return err
}
@@ -18,7 +18,7 @@ func (s *checkpointRouter) postContainerCheckpoint(ctx context.Context, w http.R
return err
}
err := s.backend.CheckpointCreate(vars["name"], options)
err := cr.backend.CheckpointCreate(vars["name"], options)
if err != nil {
return err
}
@@ -27,12 +27,12 @@ func (s *checkpointRouter) postContainerCheckpoint(ctx context.Context, w http.R
return nil
}
func (s *checkpointRouter) getContainerCheckpoints(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
func (cr *checkpointRouter) getContainerCheckpoints(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
if err := httputils.ParseForm(r); err != nil {
return err
}
checkpoints, err := s.backend.CheckpointList(vars["name"], checkpoint.ListOptions{
checkpoints, err := cr.backend.CheckpointList(vars["name"], checkpoint.ListOptions{
CheckpointDir: r.Form.Get("dir"),
})
if err != nil {
@@ -42,12 +42,12 @@ func (s *checkpointRouter) getContainerCheckpoints(ctx context.Context, w http.R
return httputils.WriteJSON(w, http.StatusOK, checkpoints)
}
func (s *checkpointRouter) deleteContainerCheckpoint(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
func (cr *checkpointRouter) deleteContainerCheckpoint(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
if err := httputils.ParseForm(r); err != nil {
return err
}
err := s.backend.CheckpointDelete(vars["name"], checkpoint.DeleteOptions{
err := cr.backend.CheckpointDelete(vars["name"], checkpoint.DeleteOptions{
CheckpointDir: r.Form.Get("dir"),
CheckpointID: vars["checkpoint"],
})

View File

@@ -18,14 +18,14 @@ func NewRouter(backend Backend) router.Router {
}
// Routes returns the available routes
func (r *distributionRouter) Routes() []router.Route {
return r.routes
func (dr *distributionRouter) Routes() []router.Route {
return dr.routes
}
// initRoutes initializes the routes in the distribution router
func (r *distributionRouter) initRoutes() {
r.routes = []router.Route{
func (dr *distributionRouter) initRoutes() {
dr.routes = []router.Route{
// GET
router.NewGetRoute("/distribution/{name:.*}/json", r.getDistributionInfo),
router.NewGetRoute("/distribution/{name:.*}/json", dr.getDistributionInfo),
}
}

View File

@@ -17,7 +17,7 @@ import (
"github.com/pkg/errors"
)
func (s *distributionRouter) getDistributionInfo(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
func (dr *distributionRouter) getDistributionInfo(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
if err := httputils.ParseForm(r); err != nil {
return err
}
@@ -43,7 +43,7 @@ func (s *distributionRouter) getDistributionInfo(ctx context.Context, w http.Res
// For a search it is not an error if no auth was given. Ignore invalid
// AuthConfig to increase compatibility with the existing API.
authConfig, _ := registry.DecodeAuthConfig(r.Header.Get(registry.AuthHeader))
repos, err := s.backend.GetRepositories(ctx, namedRef, authConfig)
repos, err := dr.backend.GetRepositories(ctx, namedRef, authConfig)
if err != nil {
return err
}
@@ -64,7 +64,7 @@ func (s *distributionRouter) getDistributionInfo(ctx context.Context, w http.Res
// - https://github.com/moby/moby/blob/12c7411b6b7314bef130cd59f1c7384a7db06d0b/distribution/pull.go#L76-L152
var lastErr error
for _, repo := range repos {
distributionInspect, err := s.fetchManifest(ctx, repo, namedRef)
distributionInspect, err := dr.fetchManifest(ctx, repo, namedRef)
if err != nil {
lastErr = err
continue
@@ -74,7 +74,7 @@ func (s *distributionRouter) getDistributionInfo(ctx context.Context, w http.Res
return lastErr
}
func (s *distributionRouter) fetchManifest(ctx context.Context, distrepo distribution.Repository, namedRef reference.Named) (registry.DistributionInspect, error) {
func (dr *distributionRouter) fetchManifest(ctx context.Context, distrepo distribution.Repository, namedRef reference.Named) (registry.DistributionInspect, error) {
var distributionInspect registry.DistributionInspect
if canonicalRef, ok := namedRef.(reference.Canonical); !ok {
namedRef = reference.TagNameOnly(namedRef)

View File

@@ -22,22 +22,22 @@ func NewRouter(b Backend, c ClusterBackend) router.Router {
}
// Routes returns the available routes to the network controller
func (r *networkRouter) Routes() []router.Route {
return r.routes
func (n *networkRouter) Routes() []router.Route {
return n.routes
}
func (r *networkRouter) initRoutes() {
r.routes = []router.Route{
func (n *networkRouter) initRoutes() {
n.routes = []router.Route{
// GET
router.NewGetRoute("/networks", r.getNetworksList),
router.NewGetRoute("/networks/", r.getNetworksList),
router.NewGetRoute("/networks/{id:.+}", r.getNetwork),
router.NewGetRoute("/networks", n.getNetworksList),
router.NewGetRoute("/networks/", n.getNetworksList),
router.NewGetRoute("/networks/{id:.+}", n.getNetwork),
// POST
router.NewPostRoute("/networks/create", r.postNetworkCreate),
router.NewPostRoute("/networks/{id:.*}/connect", r.postNetworkConnect),
router.NewPostRoute("/networks/{id:.*}/disconnect", r.postNetworkDisconnect),
router.NewPostRoute("/networks/prune", r.postNetworksPrune),
router.NewPostRoute("/networks/create", n.postNetworkCreate),
router.NewPostRoute("/networks/{id:.*}/connect", n.postNetworkConnect),
router.NewPostRoute("/networks/{id:.*}/disconnect", n.postNetworkDisconnect),
router.NewPostRoute("/networks/prune", n.postNetworksPrune),
// DELETE
router.NewDeleteRoute("/networks/{id:.*}", r.deleteNetwork),
router.NewDeleteRoute("/networks/{id:.*}", n.deleteNetwork),
}
}

View File

@@ -18,22 +18,22 @@ func NewRouter(b Backend) router.Router {
}
// Routes returns the available routers to the plugin controller
func (r *pluginRouter) Routes() []router.Route {
return r.routes
func (pr *pluginRouter) Routes() []router.Route {
return pr.routes
}
func (r *pluginRouter) initRoutes() {
r.routes = []router.Route{
router.NewGetRoute("/plugins", r.listPlugins),
router.NewGetRoute("/plugins/{name:.*}/json", r.inspectPlugin),
router.NewGetRoute("/plugins/privileges", r.getPrivileges),
router.NewDeleteRoute("/plugins/{name:.*}", r.removePlugin),
router.NewPostRoute("/plugins/{name:.*}/enable", r.enablePlugin),
router.NewPostRoute("/plugins/{name:.*}/disable", r.disablePlugin),
router.NewPostRoute("/plugins/pull", r.pullPlugin),
router.NewPostRoute("/plugins/{name:.*}/push", r.pushPlugin),
router.NewPostRoute("/plugins/{name:.*}/upgrade", r.upgradePlugin),
router.NewPostRoute("/plugins/{name:.*}/set", r.setPlugin),
router.NewPostRoute("/plugins/create", r.createPlugin),
func (pr *pluginRouter) initRoutes() {
pr.routes = []router.Route{
router.NewGetRoute("/plugins", pr.listPlugins),
router.NewGetRoute("/plugins/{name:.*}/json", pr.inspectPlugin),
router.NewGetRoute("/plugins/privileges", pr.getPrivileges),
router.NewDeleteRoute("/plugins/{name:.*}", pr.removePlugin),
router.NewPostRoute("/plugins/{name:.*}/enable", pr.enablePlugin),
router.NewPostRoute("/plugins/{name:.*}/disable", pr.disablePlugin),
router.NewPostRoute("/plugins/pull", pr.pullPlugin),
router.NewPostRoute("/plugins/{name:.*}/push", pr.pushPlugin),
router.NewPostRoute("/plugins/{name:.*}/upgrade", pr.upgradePlugin),
router.NewPostRoute("/plugins/{name:.*}/set", pr.setPlugin),
router.NewPostRoute("/plugins/create", pr.createPlugin),
}
}

View File

@@ -18,12 +18,12 @@ func NewRouter(b Backend) router.Router {
}
// Routes returns the available routers to the session controller
func (r *sessionRouter) Routes() []router.Route {
return r.routes
func (sr *sessionRouter) Routes() []router.Route {
return sr.routes
}
func (r *sessionRouter) initRoutes() {
r.routes = []router.Route{
router.NewPostRoute("/session", r.startSession),
func (sr *sessionRouter) initRoutes() {
sr.routes = []router.Route{
router.NewPostRoute("/session", sr.startSession),
}
}

View File

@@ -20,21 +20,21 @@ func NewRouter(b Backend, cb ClusterBackend) router.Router {
}
// Routes returns the available routes to the volumes controller
func (r *volumeRouter) Routes() []router.Route {
return r.routes
func (v *volumeRouter) Routes() []router.Route {
return v.routes
}
func (r *volumeRouter) initRoutes() {
r.routes = []router.Route{
func (v *volumeRouter) initRoutes() {
v.routes = []router.Route{
// GET
router.NewGetRoute("/volumes", r.getVolumesList),
router.NewGetRoute("/volumes/{name:.*}", r.getVolumeByName),
router.NewGetRoute("/volumes", v.getVolumesList),
router.NewGetRoute("/volumes/{name:.*}", v.getVolumeByName),
// POST
router.NewPostRoute("/volumes/create", r.postVolumesCreate),
router.NewPostRoute("/volumes/prune", r.postVolumesPrune),
router.NewPostRoute("/volumes/create", v.postVolumesCreate),
router.NewPostRoute("/volumes/prune", v.postVolumesPrune),
// PUT
router.NewPutRoute("/volumes/{name:.*}", r.putVolumesUpdate),
router.NewPutRoute("/volumes/{name:.*}", v.putVolumesUpdate),
// DELETE
router.NewDeleteRoute("/volumes/{name:.*}", r.deleteVolumes),
router.NewDeleteRoute("/volumes/{name:.*}", v.deleteVolumes),
}
}

View File

@@ -278,38 +278,38 @@ func withoutHealthcheck() runConfigModifier {
}
func copyRunConfig(runConfig *container.Config, modifiers ...runConfigModifier) *container.Config {
copy := *runConfig
copy.Cmd = copyStringSlice(runConfig.Cmd)
copy.Env = copyStringSlice(runConfig.Env)
copy.Entrypoint = copyStringSlice(runConfig.Entrypoint)
copy.OnBuild = copyStringSlice(runConfig.OnBuild)
copy.Shell = copyStringSlice(runConfig.Shell)
c := *runConfig
c.Cmd = copyStringSlice(runConfig.Cmd)
c.Env = copyStringSlice(runConfig.Env)
c.Entrypoint = copyStringSlice(runConfig.Entrypoint)
c.OnBuild = copyStringSlice(runConfig.OnBuild)
c.Shell = copyStringSlice(runConfig.Shell)
if copy.Volumes != nil {
copy.Volumes = make(map[string]struct{}, len(runConfig.Volumes))
if c.Volumes != nil {
c.Volumes = make(map[string]struct{}, len(runConfig.Volumes))
for k, v := range runConfig.Volumes {
copy.Volumes[k] = v
c.Volumes[k] = v
}
}
if copy.ExposedPorts != nil {
copy.ExposedPorts = make(nat.PortSet, len(runConfig.ExposedPorts))
if c.ExposedPorts != nil {
c.ExposedPorts = make(nat.PortSet, len(runConfig.ExposedPorts))
for k, v := range runConfig.ExposedPorts {
copy.ExposedPorts[k] = v
c.ExposedPorts[k] = v
}
}
if copy.Labels != nil {
copy.Labels = make(map[string]string, len(runConfig.Labels))
if c.Labels != nil {
c.Labels = make(map[string]string, len(runConfig.Labels))
for k, v := range runConfig.Labels {
copy.Labels[k] = v
c.Labels[k] = v
}
}
for _, modifier := range modifiers {
modifier(&copy)
modifier(&c)
}
return &copy
return &c
}
func copyStringSlice(orig []string) []string {

View File

@@ -166,17 +166,17 @@ func fullMutableRunConfig() *container.Config {
func TestDeepCopyRunConfig(t *testing.T) {
runConfig := fullMutableRunConfig()
copy := copyRunConfig(runConfig)
assert.Check(t, is.DeepEqual(fullMutableRunConfig(), copy))
deepCopy := copyRunConfig(runConfig)
assert.Check(t, is.DeepEqual(fullMutableRunConfig(), deepCopy))
copy.Cmd[1] = "arg2"
copy.Env[1] = "env2=new"
copy.ExposedPorts["10002"] = struct{}{}
copy.Volumes["three"] = struct{}{}
copy.Entrypoint[1] = "arg2"
copy.OnBuild[0] = "start"
copy.Labels["label3"] = "value3"
copy.Shell[0] = "sh"
deepCopy.Cmd[1] = "arg2"
deepCopy.Env[1] = "env2=new"
deepCopy.ExposedPorts["10002"] = struct{}{}
deepCopy.Volumes["three"] = struct{}{}
deepCopy.Entrypoint[1] = "arg2"
deepCopy.OnBuild[0] = "start"
deepCopy.Labels["label3"] = "value3"
deepCopy.Shell[0] = "sh"
assert.Check(t, is.DeepEqual(fullMutableRunConfig(), runConfig))
}

View File

@@ -9,7 +9,9 @@ import (
"os"
"path/filepath"
"runtime"
"slices"
"sort"
"strconv"
"strings"
"sync"
"time"
@@ -67,6 +69,14 @@ import (
"tags.cncf.io/container-device-interface/pkg/cdi"
)
// strongTLSCiphers defines a secure, modern set of TLS cipher suites for use by the daemon.
var strongTLSCiphers = []uint16{
tls.TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
}
// DaemonCli represents the daemon CLI.
type DaemonCli struct {
*config.Config
@@ -779,6 +789,18 @@ func newAPIServerTLSConfig(config *config.Config) (*tls.Config, error) {
if err != nil {
return nil, errors.Wrap(err, "invalid TLS configuration")
}
// Optionally enforce strong TLS ciphers via the environment variable DOCKER_DISABLE_WEAK_CIPHERS.
// When set to true, weak TLS ciphers are disabled, restricting the daemon to a modern, secure
// subset of cipher suites.
if disableWeakCiphers := os.Getenv("DOCKER_DISABLE_WEAK_CIPHERS"); disableWeakCiphers != "" {
disable, err := strconv.ParseBool(disableWeakCiphers)
if err != nil {
return nil, errors.Wrap(err, "invalid value for DOCKER_DISABLE_WEAK_CIPHERS")
}
if disable {
tlsConfig.CipherSuites = slices.Clone(strongTLSCiphers)
}
}
}
return tlsConfig, nil

View File

@@ -46,11 +46,11 @@ func (s *Health) Status() string {
// obeying the locking semantics.
//
// Status may be set directly if another lock is used.
func (s *Health) SetStatus(new string) {
func (s *Health) SetStatus(healthStatus string) {
s.mu.Lock()
defer s.mu.Unlock()
s.Health.Status = new
s.Health.Status = healthStatus
}
// OpenMonitorChannel creates and returns a new monitor channel. If there

View File

@@ -72,30 +72,11 @@ fetch_blob() {
shift
local curlArgs=("$@")
local curlHeaders
curlHeaders="$(
curl -S "${curlArgs[@]}" \
-H "Authorization: Bearer $token" \
"$registryBase/v2/$image/blobs/$digest" \
-o "$targetFile" \
-D-
)"
curlHeaders="$(echo "$curlHeaders" | tr -d '\r')"
if grep -qE "^HTTP/[0-9].[0-9] 3" <<< "$curlHeaders"; then
rm -f "$targetFile"
local blobRedirect
blobRedirect="$(echo "$curlHeaders" | awk -F ': ' 'tolower($1) == "location" { print $2; exit }')"
if [ -z "$blobRedirect" ]; then
echo >&2 "error: failed fetching '$image' blob '$digest'"
echo "$curlHeaders" | head -1 >&2
return 1
fi
curl -fSL "${curlArgs[@]}" \
"$blobRedirect" \
-o "$targetFile"
fi
curl -L -S "${curlArgs[@]}" \
-H "Authorization: Bearer $token" \
"$registryBase/v2/$image/blobs/$digest" \
-o "$targetFile" \
-D-
}
# handle 'application/vnd.docker.distribution.manifest.v2+json' manifest

View File

@@ -224,17 +224,17 @@ func (j *Journal) Data() (map[string]string, error) {
j.restartData()
for {
var (
data unsafe.Pointer
len C.size_t
data unsafe.Pointer
length C.size_t
)
rc := C.sd_journal_enumerate_data(j.j, &data, &len)
rc := C.sd_journal_enumerate_data(j.j, &data, &length)
if rc == 0 {
return m, nil
} else if rc < 0 {
return m, fmt.Errorf("journald: error enumerating entry data: %w", syscall.Errno(-rc))
}
k, v, _ := strings.Cut(C.GoStringN((*C.char)(data), C.int(len)), "=")
k, v, _ := strings.Cut(C.GoStringN((*C.char)(data), C.int(length)), "=")
m[k] = v
}
}

View File

@@ -102,10 +102,10 @@ func New(info logger.Info) (logger.Logger, error) {
return nil, fmt.Errorf("journald is not enabled on this host")
}
return new(info)
return newJournald(info)
}
func new(info logger.Info) (*journald, error) {
func newJournald(info logger.Info) (*journald, error) {
// parse log tag
tag, err := loggerutils.ParseLogTag(info, loggerutils.DefaultTemplate)
if err != nil {

View File

@@ -24,7 +24,7 @@ func TestLogRead(t *testing.T) {
// LogReader needs to filter out.
rotatedJournal := fake.NewT(t, journalDir+"/rotated.journal")
rotatedJournal.AssignEventTimestampFromSyslogTimestamp = true
l, err := new(logger.Info{
l, err := newJournald(logger.Info{
ContainerID: "wrongone0001",
ContainerName: "fake",
})
@@ -36,7 +36,7 @@ func TestLogRead(t *testing.T) {
activeJournal := fake.NewT(t, journalDir+"/fake.journal")
activeJournal.AssignEventTimestampFromSyslogTimestamp = true
l, err = new(logger.Info{
l, err = newJournald(logger.Info{
ContainerID: "wrongone0002",
ContainerName: "fake",
})
@@ -47,7 +47,7 @@ func TestLogRead(t *testing.T) {
assert.NilError(t, rotatedJournal.Send("a log message from a totally different process in the active journal", journal.PriInfo, nil))
return func(t *testing.T) logger.Logger {
l, err := new(info)
l, err := newJournald(info)
assert.NilError(t, err)
l.journalReadDir = journalDir
sl := &syncLogger{journald: l, waiters: map[uint64]chan<- struct{}{}}

View File

@@ -510,12 +510,12 @@ func logMessages(t *testing.T, l logger.Logger, messages []*logger.Message) []*l
// existing behavior of the json-file log driver.
func transformToExpected(m *logger.Message) *logger.Message {
// Copy the log message again so as not to mutate the input.
copy := copyLogMessage(m)
logMessageCopy := copyLogMessage(m)
if m.PLogMetaData == nil || m.PLogMetaData.Last {
copy.Line = append(copy.Line, '\n')
logMessageCopy.Line = append(logMessageCopy.Line, '\n')
}
return copy
return logMessageCopy
}
func copyLogMessage(src *logger.Message) *logger.Message {

View File

@@ -22,7 +22,7 @@ const extName = "LogDriver"
type logPlugin interface {
StartLogging(streamPath string, info Info) (err error)
StopLogging(streamPath string) (err error)
Capabilities() (cap Capability, err error)
Capabilities() (capability Capability, err error)
ReadLogs(info Info, config ReadConfig) (stream io.ReadCloser, err error)
}
@@ -90,9 +90,9 @@ func makePluginCreator(name string, l logPlugin, scopePath func(s string) string
logInfo: logCtx,
}
cap, err := a.plugin.Capabilities()
caps, err := a.plugin.Capabilities()
if err == nil {
a.capabilities = cap
a.capabilities = caps
}
stream, err := openPluginStream(a)
@@ -107,7 +107,7 @@ func makePluginCreator(name string, l logPlugin, scopePath func(s string) string
return nil, errors.Wrapf(err, "error creating logger")
}
if cap.ReadLogs {
if caps.ReadLogs {
return &pluginAdapterWithRead{a}, nil
}

View File

@@ -80,13 +80,11 @@ func (pp *logPluginProxy) Capabilities() (cap Capability, err error) {
return
}
cap = ret.Cap
if ret.Err != "" {
err = errors.New(ret.Err)
}
return
return ret.Cap, err
}
type logPluginProxyReadLogsRequest struct {

View File

@@ -641,7 +641,7 @@ func getMaxMountAndExistenceCheckAttempts(layer PushLayer) (maxMountAttempts, ma
func getRepositoryMountCandidates(
repoInfo reference.Named,
hmacKey []byte,
max int,
maxCandidates int,
v2Metadata []metadata.V2Metadata,
) []metadata.V2Metadata {
candidates := []metadata.V2Metadata{}
@@ -658,9 +658,9 @@ func getRepositoryMountCandidates(
}
sortV2MetadataByLikenessAndAge(repoInfo, hmacKey, candidates)
if max >= 0 && len(candidates) > max {
if maxCandidates >= 0 && len(candidates) > maxCandidates {
// select the youngest metadata
candidates = candidates[:max]
candidates = candidates[:maxCandidates]
}
return candidates

View File

@@ -52,9 +52,9 @@ type DownloadOption func(*LayerDownloadManager)
// WithMaxDownloadAttempts configures the maximum number of download
// attempts for a download manager.
func WithMaxDownloadAttempts(max int) DownloadOption {
func WithMaxDownloadAttempts(maxDownloadAttempts int) DownloadOption {
return func(dlm *LayerDownloadManager) {
dlm.maxDownloadAttempts = max
dlm.maxDownloadAttempts = maxDownloadAttempts
}
}

View File

@@ -172,11 +172,16 @@ variable "SYSTEMD" {
default = "false"
}
variable "FIREWALLD" {
default = "false"
}
target "dev" {
inherits = ["_common"]
target = "dev"
args = {
SYSTEMD = SYSTEMD
FIREWALLD = FIREWALLD
}
tags = ["docker-dev"]
output = ["type=docker"]

View File

@@ -56,12 +56,27 @@ if [ -d /sys/kernel/security ] && ! mountpoint -q /sys/kernel/security; then
}
fi
# Allow connections coming from the host (through eth0). This is needed to
# access the daemon port (independently of which port is used), or run a
# 'remote' Delve session, etc...
if [ "${FIREWALLD:-}" = "true" ]; then
cat > /etc/firewalld/zones/trusted.xml << EOF
<?xml version="1.0" encoding="utf-8"?>
<zone target="ACCEPT">
<short>Trusted</short>
<description>All network connections are accepted.</description>
<interface name="eth0"/>
<forward/>
</zone>
EOF
fi
env > /etc/docker-entrypoint-env
cat > /etc/systemd/system/docker-entrypoint.target << EOF
[Unit]
Description=the target for docker-entrypoint.service
Requires=docker-entrypoint.service systemd-logind.service systemd-user-sessions.service
Requires=docker-entrypoint.service systemd-logind.service systemd-user-sessions.service $([ "${FIREWALLD:-}" = "true" ] && echo firewalld.service)
EOF
quoted_args="$(printf " %q" "${@}")"

View File

@@ -5,27 +5,30 @@ set -e
SCRIPTDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPTDIR}/.validate"
tidy_files=('vendor.mod' 'vendor.sum')
modules_files=('man/go.mod' 'vendor.mod')
tidy_files=("${modules_files[@]}" 'man/go.sum' 'vendor.sum')
vendor_files=("${tidy_files[@]}" 'vendor/')
validate_vendor_tidy() {
validate_tidy_modules() {
# check that all go.mod files exist in HEAD; go.sum files are generated by 'go mod tidy'
# so we don't need to check for their existence beforehand
for f in "${modules_files[@]}"; do
if [ ! -f "$f" ]; then
echo >&2 "ERROR: missing $f"
return 1
fi
done
# run mod tidy
./hack/vendor.sh tidy
# check if any files have changed
git diff --quiet HEAD -- "${tidy_files[@]}"
git diff --quiet HEAD -- "${tidy_files[@]}" && [ -z "$(git ls-files --others --exclude-standard)" ]
}
validate_vendor_diff() {
mapfile -t changed_files < <(validate_diff --diff-filter=ACMR --name-only -- "${vendor_files[@]}")
if [ -n "${TEST_FORCE_VALIDATE:-}" ] || [ "${#changed_files[@]}" -gt 0 ]; then
# recreate vendor/
./hack/vendor.sh vendor
# check if any files have changed
git diff --quiet HEAD -- "${vendor_files[@]}"
else
echo >&2 'INFO: no vendor changes in diff; skipping vendor check.'
fi
# recreate vendor/
./hack/vendor.sh vendor
# check if any files have changed
git diff --quiet HEAD -- "${vendor_files[@]}" && [ -z "$(git ls-files --others --exclude-standard)" ]
}
validate_vendor_license() {
@@ -37,16 +40,22 @@ validate_vendor_license() {
done < <(awk '/^# /{ print $2 }' vendor/modules.txt)
}
if validate_vendor_tidy && validate_vendor_diff && validate_vendor_license; then
if validate_tidy_modules && validate_vendor_diff && validate_vendor_license; then
echo >&2 'PASS: Vendoring has been performed correctly!'
else
{
echo 'FAIL: Vendoring was not performed correctly!'
echo
echo 'The following files changed during re-vendor:'
echo
git diff --name-status HEAD -- "${vendor_files[@]}"
echo
if [ -n "$(git ls-files --others --exclude-standard)" ]; then
echo 'The following files are missing:'
git ls-files --others --exclude-standard
echo
fi
if [ -n "$(git diff --name-status HEAD -- "${vendor_files[@]}")" ]; then
echo 'The following files changed during re-vendor:'
git diff --name-status HEAD -- "${vendor_files[@]}"
echo
fi
echo 'Please revendor with hack/vendor.sh'
echo
git diff --diff-filter=M -- "${vendor_files[@]}"

View File

@@ -7,15 +7,32 @@
set -e
SCRIPTDIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_DIR="$(cd "$SCRIPTDIR/.." && pwd)"
tidy() (
(
set -x
"${SCRIPTDIR}"/with-go-mod.sh go mod tidy -modfile vendor.mod -compat 1.18
)
(
set -x
cd man
go mod tidy
)
)
vendor() (
(
set -x
"${SCRIPTDIR}"/with-go-mod.sh go mod vendor -modfile vendor.mod
)
(
set -x
cd man
go mod vendor
)
)
help() {

View File

@@ -7,6 +7,7 @@ import (
"io"
"os"
"path/filepath"
"strings"
"testing"
"github.com/docker/docker/api/types"
@@ -31,6 +32,8 @@ func TestCopyFromContainerPathDoesNotExist(t *testing.T) {
assert.Check(t, is.ErrorContains(err, "Could not find the file /dne in container "+cid))
}
// TestCopyFromContainerPathIsNotDir tests that an error is returned when
// trying to create a directory on a path that's a file.
func TestCopyFromContainerPathIsNotDir(t *testing.T) {
skip.If(t, testEnv.UsingSnapshotter(), "FIXME: https://github.com/moby/moby/issues/47107")
ctx := setupTest(t)
@@ -38,14 +41,29 @@ func TestCopyFromContainerPathIsNotDir(t *testing.T) {
apiClient := testEnv.APIClient()
cid := container.Create(ctx, t, apiClient)
path := "/etc/passwd/"
expected := "not a directory"
// Pick a path that already exists as a file; on Linux "/etc/passwd"
// is expected to be there, so we pick that for convenience.
existingFile := "/etc/passwd/"
expected := []string{"not a directory"}
if testEnv.DaemonInfo.OSType == "windows" {
path = "c:/windows/system32/drivers/etc/hosts/"
expected = "The filename, directory name, or volume label syntax is incorrect."
existingFile = "c:/windows/system32/drivers/etc/hosts/"
// Depending on the version of Windows, this produces a "ERROR_INVALID_NAME" (Windows < 2025),
// or a "ERROR_DIRECTORY" (Windows 2025); https://learn.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-
expected = []string{
"The directory name is invalid.", // ERROR_DIRECTORY
"The filename, directory name, or volume label syntax is incorrect.", // ERROR_INVALID_NAME
}
}
_, _, err := apiClient.CopyFromContainer(ctx, cid, path)
assert.Assert(t, is.ErrorContains(err, expected))
_, _, err := apiClient.CopyFromContainer(ctx, cid, existingFile)
var found bool
for _, expErr := range expected {
if err != nil && strings.Contains(err.Error(), expErr) {
found = true
break
}
}
assert.Check(t, found, "Expected error to be one of %v, but got %v", expected, err)
}
func TestCopyToContainerPathDoesNotExist(t *testing.T) {

View File

@@ -0,0 +1,412 @@
package container // import "github.com/docker/docker/integration/container"
import (
"context"
"strings"
"testing"
"time"
"github.com/docker/docker/api/types"
containertypes "github.com/docker/docker/api/types/container"
"github.com/docker/docker/api/types/mount"
"github.com/docker/docker/api/types/volume"
"github.com/docker/docker/integration/internal/container"
"github.com/docker/docker/testutil"
"gotest.tools/v3/assert"
is "gotest.tools/v3/assert/cmp"
)
// TestWindowsProcessIsolation validates process isolation on Windows.
func TestWindowsProcessIsolation(t *testing.T) {
ctx := setupTest(t)
apiClient := testEnv.APIClient()
testcases := []struct {
name string
description string
validate func(t *testing.T, ctx context.Context, id string)
}{
{
name: "Process isolation basic container lifecycle",
description: "Validate container can start, run, and stop with process isolation",
validate: func(t *testing.T, ctx context.Context, id string) {
// Verify container is running
ctrInfo := container.Inspect(ctx, t, apiClient, id)
assert.Check(t, is.Equal(ctrInfo.State.Running, true))
assert.Check(t, is.Equal(ctrInfo.HostConfig.Isolation, containertypes.IsolationProcess))
execCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
res := container.ExecT(execCtx, t, apiClient, id, []string{"cmd", "/c", "echo", "test"})
assert.Check(t, is.Equal(res.ExitCode, 0))
assert.Check(t, strings.Contains(res.Stdout(), "test"))
},
},
{
name: "Process isolation filesystem access",
description: "Validate filesystem operations work correctly with process isolation",
validate: func(t *testing.T, ctx context.Context, id string) {
execCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
// Create a test file
res := container.ExecT(execCtx, t, apiClient, id,
[]string{"cmd", "/c", "echo test123 > C:\\testfile.txt"})
assert.Check(t, is.Equal(res.ExitCode, 0))
// Read the test file
execCtx2, cancel2 := context.WithTimeout(ctx, 10*time.Second)
defer cancel2()
res2 := container.ExecT(execCtx2, t, apiClient, id,
[]string{"cmd", "/c", "type", "C:\\testfile.txt"})
assert.Check(t, is.Equal(res2.ExitCode, 0))
assert.Check(t, strings.Contains(res2.Stdout(), "test123"))
},
},
{
name: "Process isolation network connectivity",
description: "Validate network connectivity works with process isolation",
validate: func(t *testing.T, ctx context.Context, id string) {
execCtx, cancel := context.WithTimeout(ctx, 15*time.Second)
defer cancel()
// Test localhost connectivity
res := container.ExecT(execCtx, t, apiClient, id,
[]string{"ping", "-n", "1", "-w", "3000", "localhost"})
assert.Check(t, is.Equal(res.ExitCode, 0))
assert.Check(t, strings.Contains(res.Stdout(), "Reply from") ||
strings.Contains(res.Stdout(), "Received = 1"))
},
},
{
name: "Process isolation environment variables",
description: "Validate environment variables are properly isolated",
validate: func(t *testing.T, ctx context.Context, id string) {
execCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
// Check that container has expected environment variables
res := container.ExecT(execCtx, t, apiClient, id,
[]string{"cmd", "/c", "set"})
assert.Check(t, is.Equal(res.ExitCode, 0))
// Should have Windows-specific environment variables
stdout := res.Stdout()
assert.Check(t, strings.Contains(stdout, "COMPUTERNAME") ||
strings.Contains(stdout, "OS=Windows"))
},
},
{
name: "Process isolation CPU access",
description: "Validate container can access CPU information",
validate: func(t *testing.T, ctx context.Context, id string) {
execCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
// Check NUMBER_OF_PROCESSORS environment variable
res := container.ExecT(execCtx, t, apiClient, id,
[]string{"cmd", "/c", "echo", "%NUMBER_OF_PROCESSORS%"})
assert.Check(t, is.Equal(res.ExitCode, 0))
// Should return a number
output := strings.TrimSpace(res.Stdout())
assert.Check(t, output != "" && output != "%NUMBER_OF_PROCESSORS%",
"NUMBER_OF_PROCESSORS not set")
},
},
}
for _, tc := range testcases {
t.Run(tc.name, func(t *testing.T) {
ctx := testutil.StartSpan(ctx, t)
// Create and start container with process isolation
id := container.Run(ctx, t, apiClient,
container.WithIsolation(containertypes.IsolationProcess),
container.WithCmd("ping", "-t", "localhost"),
)
defer apiClient.ContainerRemove(ctx, id, containertypes.RemoveOptions{Force: true})
tc.validate(t, ctx, id)
})
}
}
// TestWindowsHyperVIsolation validates Hyper-V isolation on Windows.
func TestWindowsHyperVIsolation(t *testing.T) {
ctx := setupTest(t)
apiClient := testEnv.APIClient()
testcases := []struct {
name string
description string
validate func(t *testing.T, ctx context.Context, id string)
}{
{
name: "Hyper-V isolation basic container lifecycle",
description: "Validate container can start, run, and stop with Hyper-V isolation",
validate: func(t *testing.T, ctx context.Context, id string) {
// Verify container is running
ctrInfo := container.Inspect(ctx, t, apiClient, id)
assert.Check(t, is.Equal(ctrInfo.State.Running, true))
assert.Check(t, is.Equal(ctrInfo.HostConfig.Isolation, containertypes.IsolationHyperV))
// Execute a simple command
execCtx, cancel := context.WithTimeout(ctx, 15*time.Second)
defer cancel()
res := container.ExecT(execCtx, t, apiClient, id, []string{"cmd", "/c", "echo", "hyperv-test"})
assert.Check(t, is.Equal(res.ExitCode, 0))
assert.Check(t, strings.Contains(res.Stdout(), "hyperv-test"))
},
},
{
name: "Hyper-V isolation filesystem operations",
description: "Validate filesystem isolation with Hyper-V",
validate: func(t *testing.T, ctx context.Context, id string) {
execCtx, cancel := context.WithTimeout(ctx, 15*time.Second)
defer cancel()
// Test file creation
res := container.ExecT(execCtx, t, apiClient, id,
[]string{"cmd", "/c", "echo hyperv-file > C:\\hvtest.txt"})
assert.Check(t, is.Equal(res.ExitCode, 0))
// Test file read
execCtx2, cancel2 := context.WithTimeout(ctx, 15*time.Second)
defer cancel2()
res2 := container.ExecT(execCtx2, t, apiClient, id,
[]string{"cmd", "/c", "type", "C:\\hvtest.txt"})
assert.Check(t, is.Equal(res2.ExitCode, 0))
assert.Check(t, strings.Contains(res2.Stdout(), "hyperv-file"))
},
},
{
name: "Hyper-V isolation network connectivity",
description: "Validate network works with Hyper-V isolation",
validate: func(t *testing.T, ctx context.Context, id string) {
execCtx, cancel := context.WithTimeout(ctx, 15*time.Second)
defer cancel()
// Test localhost connectivity
res := container.ExecT(execCtx, t, apiClient, id,
[]string{"ping", "-n", "1", "-w", "5000", "localhost"})
assert.Check(t, is.Equal(res.ExitCode, 0))
},
},
}
for _, tc := range testcases {
t.Run(tc.name, func(t *testing.T) {
ctx := testutil.StartSpan(ctx, t)
// Create and start container with Hyper-V isolation
id := container.Run(ctx, t, apiClient,
container.WithIsolation(containertypes.IsolationHyperV),
container.WithCmd("ping", "-t", "localhost"),
)
defer apiClient.ContainerRemove(ctx, id, containertypes.RemoveOptions{Force: true})
tc.validate(t, ctx, id)
})
}
}
// TestWindowsIsolationComparison validates that both isolation modes can coexist
// and that containers can be created with different isolation modes on Windows.
func TestWindowsIsolationComparison(t *testing.T) {
ctx := setupTest(t)
apiClient := testEnv.APIClient()
// Create container with process isolation
processID := container.Run(ctx, t, apiClient,
container.WithIsolation(containertypes.IsolationProcess),
container.WithCmd("ping", "-t", "localhost"),
)
defer apiClient.ContainerRemove(ctx, processID, containertypes.RemoveOptions{Force: true})
processInfo := container.Inspect(ctx, t, apiClient, processID)
assert.Check(t, is.Equal(processInfo.HostConfig.Isolation, containertypes.IsolationProcess))
assert.Check(t, is.Equal(processInfo.State.Running, true))
// Create container with Hyper-V isolation
hypervID := container.Run(ctx, t, apiClient,
container.WithIsolation(containertypes.IsolationHyperV),
container.WithCmd("ping", "-t", "localhost"),
)
defer apiClient.ContainerRemove(ctx, hypervID, containertypes.RemoveOptions{Force: true})
hypervInfo := container.Inspect(ctx, t, apiClient, hypervID)
assert.Check(t, is.Equal(hypervInfo.HostConfig.Isolation, containertypes.IsolationHyperV))
assert.Check(t, is.Equal(hypervInfo.State.Running, true))
// Verify both containers can run simultaneously
processInfo2 := container.Inspect(ctx, t, apiClient, processID)
hypervInfo2 := container.Inspect(ctx, t, apiClient, hypervID)
assert.Check(t, is.Equal(processInfo2.State.Running, true))
assert.Check(t, is.Equal(hypervInfo2.State.Running, true))
}
// TestWindowsProcessIsolationResourceConstraints validates resource constraints
// work correctly with process isolation on Windows.
func TestWindowsProcessIsolationResourceConstraints(t *testing.T) {
ctx := setupTest(t)
apiClient := testEnv.APIClient()
testcases := []struct {
name string
cpuShares int64
nanoCPUs int64
memoryLimit int64
cpuCount int64
validateConfig func(t *testing.T, ctrInfo types.ContainerJSON)
}{
{
name: "CPU shares constraint - config only",
cpuShares: 512,
// Note: CPU shares are accepted by the API but NOT enforced on Windows.
// This test only verifies the configuration is stored correctly.
// Actual enforcement does not work - containers get equal CPU regardless of shares.
// Use NanoCPUs (--cpus flag) for actual CPU limiting on Windows.
validateConfig: func(t *testing.T, ctrInfo types.ContainerJSON) {
assert.Check(t, is.Equal(ctrInfo.HostConfig.CPUShares, int64(512)))
},
},
{
name: "CPU limit (NanoCPUs) constraint",
nanoCPUs: 2000000000, // 2.0 CPUs
// NanoCPUs enforce hard CPU limits on Windows (unlike CPUShares which don't work)
validateConfig: func(t *testing.T, ctrInfo types.ContainerJSON) {
assert.Check(t, is.Equal(ctrInfo.HostConfig.NanoCPUs, int64(2000000000)))
},
},
{
name: "Memory limit constraint",
memoryLimit: 512 * 1024 * 1024, // 512MB
// Memory limits enforce hard limits on container memory usage
validateConfig: func(t *testing.T, ctrInfo types.ContainerJSON) {
assert.Check(t, is.Equal(ctrInfo.HostConfig.Memory, int64(512*1024*1024)))
},
},
{
name: "CPU count constraint",
cpuCount: 2,
// CPU count limits the number of CPUs available to the container
validateConfig: func(t *testing.T, ctrInfo types.ContainerJSON) {
assert.Check(t, is.Equal(ctrInfo.HostConfig.CPUCount, int64(2)))
},
},
}
for _, tc := range testcases {
t.Run(tc.name, func(t *testing.T) {
ctx := testutil.StartSpan(ctx, t)
opts := []func(*container.TestContainerConfig){
container.WithIsolation(containertypes.IsolationProcess),
container.WithCmd("ping", "-t", "localhost"),
}
if tc.cpuShares > 0 {
opts = append(opts, func(config *container.TestContainerConfig) {
config.HostConfig.CPUShares = tc.cpuShares
})
}
if tc.nanoCPUs > 0 {
opts = append(opts, func(config *container.TestContainerConfig) {
config.HostConfig.NanoCPUs = tc.nanoCPUs
})
}
if tc.memoryLimit > 0 {
opts = append(opts, func(config *container.TestContainerConfig) {
config.HostConfig.Memory = tc.memoryLimit
})
}
if tc.cpuCount > 0 {
opts = append(opts, func(config *container.TestContainerConfig) {
config.HostConfig.CPUCount = tc.cpuCount
})
}
id := container.Run(ctx, t, apiClient, opts...)
defer apiClient.ContainerRemove(ctx, id, containertypes.RemoveOptions{Force: true})
ctrInfo := container.Inspect(ctx, t, apiClient, id)
tc.validateConfig(t, ctrInfo)
})
}
}
// TestWindowsProcessIsolationVolumeMount validates volume mounting with process isolation on Windows.
func TestWindowsProcessIsolationVolumeMount(t *testing.T) {
ctx := setupTest(t)
apiClient := testEnv.APIClient()
volumeName := "process-iso-test-volume"
volRes, err := apiClient.VolumeCreate(ctx, volume.CreateOptions{
Name: volumeName,
})
assert.NilError(t, err)
defer func() {
// Force volume removal in case container cleanup fails
apiClient.VolumeRemove(ctx, volRes.Name, true)
}()
// Create container with volume mount
id := container.Run(ctx, t, apiClient,
container.WithIsolation(containertypes.IsolationProcess),
container.WithCmd("ping", "-t", "localhost"),
container.WithMount(mount.Mount{
Type: mount.TypeVolume,
Source: volumeName,
Target: "C:\\data",
}),
)
defer apiClient.ContainerRemove(ctx, id, containertypes.RemoveOptions{Force: true})
// Write data to mounted volume
execCtx, cancel := context.WithTimeout(ctx, 10*time.Second)
defer cancel()
res := container.ExecT(execCtx, t, apiClient, id,
[]string{"cmd", "/c", "echo volume-test > C:\\data\\test.txt"})
assert.Check(t, is.Equal(res.ExitCode, 0))
// Read data from mounted volume
execCtx2, cancel2 := context.WithTimeout(ctx, 10*time.Second)
defer cancel2()
res2 := container.ExecT(execCtx2, t, apiClient, id,
[]string{"cmd", "/c", "type", "C:\\data\\test.txt"})
assert.Check(t, is.Equal(res2.ExitCode, 0))
assert.Check(t, strings.Contains(res2.Stdout(), "volume-test"))
// Verify container has volume mount
ctrInfo := container.Inspect(ctx, t, apiClient, id)
assert.Check(t, len(ctrInfo.Mounts) == 1)
assert.Check(t, is.Equal(ctrInfo.Mounts[0].Type, mount.TypeVolume))
assert.Check(t, is.Equal(ctrInfo.Mounts[0].Name, volumeName))
}
// TestWindowsHyperVIsolationResourceLimits validates resource limits work with Hyper-V isolation.
// This ensures Windows can properly enforce resource constraints on Hyper-V containers.
func TestWindowsHyperVIsolationResourceLimits(t *testing.T) {
ctx := setupTest(t)
apiClient := testEnv.APIClient()
// Create container with memory limit
memoryLimit := int64(512 * 1024 * 1024) // 512MB
id := container.Run(ctx, t, apiClient,
container.WithIsolation(containertypes.IsolationHyperV),
container.WithCmd("ping", "-t", "localhost"),
func(config *container.TestContainerConfig) {
config.HostConfig.Memory = memoryLimit
},
)
defer apiClient.ContainerRemove(ctx, id, containertypes.RemoveOptions{Force: true})
// Verify resource limit is set
ctrInfo := container.Inspect(ctx, t, apiClient, id)
assert.Check(t, is.Equal(ctrInfo.HostConfig.Memory, memoryLimit))
assert.Check(t, is.Equal(ctrInfo.HostConfig.Isolation, containertypes.IsolationHyperV))
}

View File

@@ -1,6 +1,7 @@
package container
import (
"slices"
"strings"
"github.com/docker/docker/api/types/container"
@@ -56,6 +57,16 @@ func WithExposedPorts(ports ...string) func(*TestContainerConfig) {
}
}
// WithPortMap sets/replaces port mappings.
func WithPortMap(pm nat.PortMap) func(*TestContainerConfig) {
return func(c *TestContainerConfig) {
c.HostConfig.PortBindings = nat.PortMap{}
for p, b := range pm {
c.HostConfig.PortBindings[p] = slices.Clone(b)
}
}
}
// WithTty sets the TTY mode of the container
func WithTty(tty bool) func(*TestContainerConfig) {
return func(c *TestContainerConfig) {

View File

@@ -6,11 +6,17 @@ import (
"testing"
"time"
containertypes "github.com/docker/docker/api/types/container"
networktypes "github.com/docker/docker/api/types/network"
"github.com/docker/docker/api/types/versions"
ctr "github.com/docker/docker/integration/internal/container"
"github.com/docker/docker/integration/internal/network"
"github.com/docker/docker/internal/testutils/networking"
"github.com/docker/docker/libnetwork/drivers/bridge"
"github.com/docker/docker/testutil/daemon"
"github.com/docker/go-connections/nat"
"gotest.tools/v3/assert"
"gotest.tools/v3/icmd"
"gotest.tools/v3/skip"
)
@@ -43,3 +49,62 @@ func TestCreateWithMultiNetworks(t *testing.T) {
ifacesWithAddress := strings.Count(res.Stdout.String(), "\n")
assert.Equal(t, ifacesWithAddress, 3)
}
// TestFirewalldReloadNoZombies checks that when firewalld is reloaded, rules
// belonging to deleted networks/containers do not reappear.
func TestFirewalldReloadNoZombies(t *testing.T) {
skip.If(t, testEnv.DaemonInfo.OSType == "windows")
skip.If(t, !networking.FirewalldRunning(), "firewalld is not running")
skip.If(t, testEnv.IsRootless, "no firewalld in rootless netns")
ctx := setupTest(t)
d := daemon.New(t)
d.StartWithBusybox(ctx, t)
defer d.Stop(t)
c := d.NewClientT(t)
const bridgeName = "br-fwdreload"
removed := false
nw := network.CreateNoError(ctx, t, c, "testnet",
network.WithOption(bridge.BridgeName, bridgeName))
defer func() {
if !removed {
network.RemoveNoError(ctx, t, c, nw)
}
}()
cid := ctr.Run(ctx, t, c,
ctr.WithExposedPorts("80/tcp", "81/tcp"),
ctr.WithPortMap(nat.PortMap{"80/tcp": {{HostPort: "8000"}}}))
defer func() {
if !removed {
ctr.Remove(ctx, t, c, cid, containertypes.RemoveOptions{Force: true})
}
}()
iptablesSave := icmd.Command("iptables-save")
resBeforeDel := icmd.RunCmd(iptablesSave)
assert.NilError(t, resBeforeDel.Error)
assert.Check(t, strings.Contains(resBeforeDel.Combined(), bridgeName),
"With container: expected rules for %s in: %s", bridgeName, resBeforeDel.Combined())
// Delete the container and its network.
ctr.Remove(ctx, t, c, cid, containertypes.RemoveOptions{Force: true})
network.RemoveNoError(ctx, t, c, nw)
removed = true
// Check the network does not appear in iptables rules.
resAfterDel := icmd.RunCmd(iptablesSave)
assert.NilError(t, resAfterDel.Error)
assert.Check(t, !strings.Contains(resAfterDel.Combined(), bridgeName),
"After deletes: did not expect rules for %s in: %s", bridgeName, resAfterDel.Combined())
// firewall-cmd --reload, and wait for the daemon to restore rules.
networking.FirewalldReload(t, d)
// Check that rules for the deleted container/network have not reappeared.
resAfterReload := icmd.RunCmd(iptablesSave)
assert.NilError(t, resAfterReload.Error)
assert.Check(t, !strings.Contains(resAfterReload.Combined(), bridgeName),
"After deletes: did not expect rules for %s in: %s", bridgeName, resAfterReload.Combined())
}

View File

@@ -10,6 +10,7 @@ import (
containertypes "github.com/docker/docker/api/types/container"
"github.com/docker/docker/integration/internal/container"
"github.com/docker/docker/integration/internal/network"
"github.com/docker/docker/internal/testutils/networking"
"github.com/docker/docker/testutil"
"github.com/docker/docker/testutil/daemon"
"gotest.tools/v3/assert"
@@ -160,6 +161,8 @@ func TestBridgeICC(t *testing.T) {
Force: true,
})
networking.FirewalldReload(t, d)
pingHost := tc.pingHost
if pingHost == "" {
if tc.linkLocal {
@@ -235,7 +238,7 @@ func TestBridgeICCWindows(t *testing.T) {
pingCmd := []string{"ping", "-n", "1", "-w", "3000", ctr1Name}
const ctr2Name = "ctr2"
attachCtx, cancel := context.WithTimeout(ctx, 5*time.Second)
attachCtx, cancel := context.WithTimeout(ctx, 15*time.Second)
defer cancel()
res := container.RunAttach(attachCtx, t, c,
container.WithName(ctr2Name),
@@ -351,6 +354,7 @@ func TestBridgeINC(t *testing.T) {
defer c.ContainerRemove(ctx, id1, containertypes.RemoveOptions{
Force: true,
})
networking.FirewalldReload(t, d)
ctr1Info := container.Inspect(ctx, t, c, id1)
targetAddr := ctr1Info.NetworkSettings.Networks[bridge1].IPAddress
@@ -575,6 +579,7 @@ func TestInternalNwConnectivity(t *testing.T) {
container.WithNetworkMode(bridgeName),
)
defer c.ContainerRemove(ctx, id, containertypes.RemoveOptions{Force: true})
networking.FirewalldReload(t, d)
execCtx, cancel := context.WithTimeout(ctx, 20*time.Second)
defer cancel()

View File

@@ -0,0 +1,378 @@
package networking
import (
"context"
"fmt"
"io"
"net/http"
"strings"
"testing"
"time"
"github.com/docker/docker/api/types"
containertypes "github.com/docker/docker/api/types/container"
"github.com/docker/docker/integration/internal/container"
"github.com/docker/docker/integration/internal/network"
"github.com/docker/docker/testutil"
"github.com/docker/go-connections/nat"
"gotest.tools/v3/assert"
is "gotest.tools/v3/assert/cmp"
"gotest.tools/v3/poll"
"gotest.tools/v3/skip"
)
// TestWindowsNetworkDrivers validates Windows-specific network drivers for Windows.
// Tests: NAT, Transparent, and L2Bridge network drivers.
func TestWindowsNetworkDrivers(t *testing.T) {
ctx := setupTest(t)
c := testEnv.APIClient()
testcases := []struct {
name string
driver string
}{
{
// NAT connectivity is already tested in TestBridgeICCWindows (bridge_test.go),
// so we only validate network creation here.
name: "NAT driver network creation",
driver: "nat",
},
{
// Only test creation of a Transparent driver network, connectivity depends on external
// network infrastructure.
name: "Transparent driver network creation",
driver: "transparent",
},
{
// L2Bridge driver requires specific host network adapter configuration, test will skip
// if host configuration is missing.
name: "L2Bridge driver network creation",
driver: "l2bridge",
},
}
for tcID, tc := range testcases {
t.Run(tc.name, func(t *testing.T) {
ctx := testutil.StartSpan(ctx, t)
netName := fmt.Sprintf("test-%s-%d", tc.driver, tcID)
// Create network with specified driver
netResp, err := c.NetworkCreate(ctx, netName, types.NetworkCreate{
Driver: tc.driver,
})
if err != nil {
// L2Bridge may fail if host network configuration is not available
if tc.driver == "l2bridge" {
errStr := strings.ToLower(err.Error())
if strings.Contains(errStr, "the network does not have a subnet for this endpoint") {
t.Skipf("Driver %s requires host network configuration: %v", tc.driver, err)
}
}
t.Fatalf("Failed to create network with %s driver: %v", tc.driver, err)
}
defer network.RemoveNoError(ctx, t, c, netName)
// Inspect network to validate driver is correctly set
netInfo, err := c.NetworkInspect(ctx, netResp.ID, types.NetworkInspectOptions{})
assert.NilError(t, err)
assert.Check(t, is.Equal(netInfo.Driver, tc.driver), "Network driver mismatch")
assert.Check(t, is.Equal(netInfo.Name, netName), "Network name mismatch")
})
}
}
// TestWindowsNATDriverPortMapping validates NAT port mapping by testing host connectivity.
func TestWindowsNATDriverPortMapping(t *testing.T) {
ctx := setupTest(t)
c := testEnv.APIClient()
// Use default NAT network which supports port mapping
netName := "nat"
// PowerShell HTTP listener on port 80
psScript := `
$listener = New-Object System.Net.HttpListener
$listener.Prefixes.Add('http://+:80/')
$listener.Start()
while ($listener.IsListening) {
$context = $listener.GetContext()
$response = $context.Response
$content = [System.Text.Encoding]::UTF8.GetBytes('OK')
$response.ContentLength64 = $content.Length
$response.OutputStream.Write($content, 0, $content.Length)
$response.OutputStream.Close()
}
`
// Create container with port mapping 80->8080
ctrName := "port-mapping-test"
id := container.Run(ctx, t, c,
container.WithName(ctrName),
container.WithCmd("powershell", "-Command", psScript),
container.WithNetworkMode(netName),
container.WithExposedPorts("80/tcp"),
container.WithPortMap(nat.PortMap{
"80/tcp": []nat.PortBinding{{HostPort: "8080"}},
}),
)
defer c.ContainerRemove(ctx, id, containertypes.RemoveOptions{Force: true})
// Verify port mapping metadata
ctrInfo := container.Inspect(ctx, t, c, id)
portKey := nat.Port("80/tcp")
assert.Check(t, ctrInfo.NetworkSettings.Ports[portKey] != nil, "Port mapping not found")
assert.Check(t, len(ctrInfo.NetworkSettings.Ports[portKey]) > 0, "No host port binding")
assert.Check(t, is.Equal(ctrInfo.NetworkSettings.Ports[portKey][0].HostPort, "8080"))
// Test actual connectivity from host to container via mapped port
httpClient := &http.Client{Timeout: 2 * time.Second}
checkHTTP := func(t poll.LogT) poll.Result {
resp, err := httpClient.Get("http://localhost:8080")
if err != nil {
return poll.Continue("connection failed: %v", err)
}
defer resp.Body.Close()
body, err := io.ReadAll(resp.Body)
if err != nil {
return poll.Continue("failed to read body: %v", err)
}
if !strings.Contains(string(body), "OK") {
return poll.Continue("unexpected response body: %s", string(body))
}
return poll.Success()
}
poll.WaitOn(t, checkHTTP, poll.WithTimeout(10*time.Second))
}
// TestWindowsNetworkDNSResolution validates DNS resolution on Windows networks.
func TestWindowsNetworkDNSResolution(t *testing.T) {
ctx := setupTest(t)
c := testEnv.APIClient()
testcases := []struct {
name string
driver string
customDNS bool
dnsServers []string
}{
{
name: "Default NAT network DNS resolution",
driver: "nat",
},
{
name: "Custom DNS servers on NAT network",
driver: "nat",
customDNS: true,
dnsServers: []string{"8.8.8.8", "8.8.4.4"},
},
}
for tcID, tc := range testcases {
t.Run(tc.name, func(t *testing.T) {
ctx := testutil.StartSpan(ctx, t)
netName := fmt.Sprintf("test-dns-%s-%d", tc.driver, tcID)
// Create network with optional custom DNS
netOpts := []func(*types.NetworkCreate){
network.WithDriver(tc.driver),
}
if tc.customDNS {
// Note: DNS options may need to be set via network options on Windows
for _, dns := range tc.dnsServers {
netOpts = append(netOpts, network.WithOption("com.docker.network.windowsshim.dnsservers", dns))
}
}
network.CreateNoError(ctx, t, c, netName, netOpts...)
defer network.RemoveNoError(ctx, t, c, netName)
// Create container and verify DNS resolution
ctrName := fmt.Sprintf("dns-test-%d", tcID)
id := container.Run(ctx, t, c,
container.WithName(ctrName),
container.WithNetworkMode(netName),
)
defer c.ContainerRemove(ctx, id, containertypes.RemoveOptions{Force: true})
// Test DNS resolution by pinging container by name from another container
pingCmd := []string{"ping", "-n", "1", "-w", "3000", ctrName}
attachCtx, cancel := context.WithTimeout(ctx, 15*time.Second)
defer cancel()
res := container.RunAttach(attachCtx, t, c,
container.WithCmd(pingCmd...),
container.WithNetworkMode(netName),
)
defer c.ContainerRemove(ctx, res.ContainerID, containertypes.RemoveOptions{Force: true})
assert.Check(t, is.Equal(res.ExitCode, 0), "DNS resolution failed")
assert.Check(t, is.Contains(res.Stdout.String(), "Sent = 1, Received = 1, Lost = 0"))
})
}
}
// TestWindowsNetworkLifecycle validates network lifecycle operations on Windows.
// Tests network creation, container attachment, detachment, and deletion.
func TestWindowsNetworkLifecycle(t *testing.T) {
// Skip this test on Windows Containerd because NetworkConnect operations fail with an
// unsupported platform request error:
// https://github.com/moby/moby/issues/51589
skip.If(t, testEnv.RuntimeIsWindowsContainerd(),
"Skipping test: fails on Containerd due to unsupported platform request error during NetworkConnect operations")
ctx := setupTest(t)
c := testEnv.APIClient()
netName := "lifecycle-test-nat"
netID := network.CreateNoError(ctx, t, c, netName,
network.WithDriver("nat"),
)
netInfo, err := c.NetworkInspect(ctx, netID, types.NetworkInspectOptions{})
assert.NilError(t, err)
assert.Check(t, is.Equal(netInfo.Name, netName))
// Create container on network
ctrName := "lifecycle-ctr"
id := container.Run(ctx, t, c,
container.WithName(ctrName),
container.WithNetworkMode(netName),
)
ctrInfo := container.Inspect(ctx, t, c, id)
assert.Check(t, ctrInfo.NetworkSettings.Networks[netName] != nil)
// Disconnect container from network
err = c.NetworkDisconnect(ctx, netID, id, false)
assert.NilError(t, err)
ctrInfo = container.Inspect(ctx, t, c, id)
assert.Check(t, ctrInfo.NetworkSettings.Networks[netName] == nil, "Container still connected after disconnect")
// Reconnect container to network
err = c.NetworkConnect(ctx, netID, id, nil)
assert.NilError(t, err)
ctrInfo = container.Inspect(ctx, t, c, id)
assert.Check(t, ctrInfo.NetworkSettings.Networks[netName] != nil, "Container not reconnected")
c.ContainerRemove(ctx, id, containertypes.RemoveOptions{Force: true})
network.RemoveNoError(ctx, t, c, netName)
_, err = c.NetworkInspect(ctx, netID, types.NetworkInspectOptions{})
assert.Check(t, err != nil, "Network still exists after deletion")
}
// TestWindowsNetworkIsolation validates network isolation between containers on different networks.
// Ensures containers on different networks cannot communicate, validating Windows network driver isolation.
func TestWindowsNetworkIsolation(t *testing.T) {
ctx := setupTest(t)
c := testEnv.APIClient()
// Create two separate NAT networks
net1Name := "isolation-net1"
net2Name := "isolation-net2"
network.CreateNoError(ctx, t, c, net1Name, network.WithDriver("nat"))
defer network.RemoveNoError(ctx, t, c, net1Name)
network.CreateNoError(ctx, t, c, net2Name, network.WithDriver("nat"))
defer network.RemoveNoError(ctx, t, c, net2Name)
// Create container on first network
ctr1Name := "isolated-ctr1"
id1 := container.Run(ctx, t, c,
container.WithName(ctr1Name),
container.WithNetworkMode(net1Name),
)
defer c.ContainerRemove(ctx, id1, containertypes.RemoveOptions{Force: true})
ctr1Info := container.Inspect(ctx, t, c, id1)
ctr1IP := ctr1Info.NetworkSettings.Networks[net1Name].IPAddress
assert.Check(t, ctr1IP != "", "Container IP not assigned")
// Create container on second network and try to ping first container
pingCmd := []string{"ping", "-n", "1", "-w", "2000", ctr1IP}
attachCtx, cancel := context.WithTimeout(ctx, 15*time.Second)
defer cancel()
res := container.RunAttach(attachCtx, t, c,
container.WithCmd(pingCmd...),
container.WithNetworkMode(net2Name),
)
defer c.ContainerRemove(ctx, res.ContainerID, containertypes.RemoveOptions{Force: true})
// Ping should fail, demonstrating network isolation
assert.Check(t, res.ExitCode != 0, "Ping succeeded unexpectedly - networks are not isolated")
// Windows ping failure can have various error messages, but we should see some indication of failure
stdout := res.Stdout.String()
stderr := res.Stderr.String()
// Check for common Windows ping failure indicators
hasFailureIndicator := strings.Contains(stdout, "Destination host unreachable") ||
strings.Contains(stdout, "Request timed out") ||
strings.Contains(stdout, "100% loss") ||
strings.Contains(stdout, "Lost = 1") ||
strings.Contains(stderr, "unreachable") ||
strings.Contains(stderr, "timeout")
assert.Check(t, hasFailureIndicator,
"Expected ping failure indicators not found. Exit code: %d, stdout: %q, stderr: %q",
res.ExitCode, stdout, stderr)
}
// TestWindowsNetworkEndpointManagement validates endpoint creation and management on Windows networks.
// Tests that multiple containers can be created and managed on the same network.
func TestWindowsNetworkEndpointManagement(t *testing.T) {
ctx := setupTest(t)
c := testEnv.APIClient()
netName := "endpoint-test-nat"
network.CreateNoError(ctx, t, c, netName, network.WithDriver("nat"))
defer network.RemoveNoError(ctx, t, c, netName)
// Create multiple containers on the same network
const numContainers = 3
containerIDs := make([]string, numContainers)
for i := 0; i < numContainers; i++ {
ctrName := fmt.Sprintf("endpoint-ctr-%d", i)
id := container.Run(ctx, t, c,
container.WithName(ctrName),
container.WithNetworkMode(netName),
)
containerIDs[i] = id
defer c.ContainerRemove(ctx, id, containertypes.RemoveOptions{Force: true})
}
netInfo, err := c.NetworkInspect(ctx, netName, types.NetworkInspectOptions{})
assert.NilError(t, err)
assert.Check(t, is.Equal(len(netInfo.Containers), numContainers),
"Expected %d containers, got %d", numContainers, len(netInfo.Containers))
// Verify each container has network connectivity to others
for i := 0; i < numContainers-1; i++ {
targetName := fmt.Sprintf("endpoint-ctr-%d", i)
pingCmd := []string{"ping", "-n", "1", "-w", "3000", targetName}
sourceName := fmt.Sprintf("endpoint-ctr-%d", i+1)
attachCtx, cancel := context.WithTimeout(ctx, 15*time.Second)
defer cancel()
res := container.RunAttach(attachCtx, t, c,
container.WithName(fmt.Sprintf("%s-pinger", sourceName)),
container.WithCmd(pingCmd...),
container.WithNetworkMode(netName),
)
defer c.ContainerRemove(ctx, res.ContainerID, containertypes.RemoveOptions{Force: true})
assert.Check(t, is.Equal(res.ExitCode, 0),
"Container %s failed to ping %s", sourceName, targetName)
}
}

View File

@@ -0,0 +1,71 @@
package service
import (
stdnet "net"
"strings"
"testing"
"time"
swarmtypes "github.com/docker/docker/api/types/swarm"
"github.com/docker/docker/integration/internal/swarm"
"github.com/docker/docker/internal/testutils/networking"
"gotest.tools/v3/assert"
"gotest.tools/v3/icmd"
"gotest.tools/v3/poll"
"gotest.tools/v3/skip"
)
func TestRestoreIngressRulesOnFirewalldReload(t *testing.T) {
skip.If(t, testEnv.IsRemoteDaemon)
skip.If(t, testEnv.IsRootless, "rootless mode doesn't support Swarm-mode")
//skip.If(t, testEnv.FirewallBackendDriver() == "iptables")
skip.If(t, !networking.FirewalldRunning(), "Need firewalld to test restoration ingress rules")
ctx := setupTest(t)
// Check the published port is accessible.
checkHTTP := func(_ poll.LogT) poll.Result {
res := icmd.RunCommand("curl", "-v", "-o", "/dev/null", "-w", "%{http_code}\n",
"http://"+stdnet.JoinHostPort("localhost", "8080"))
// A "404 Not Found" means the server responded, but it's got nothing to serve.
if !strings.Contains(res.Stdout(), "404") {
return poll.Continue("404 - not found in: %s, %+v", res.Stdout(), res)
}
return poll.Success()
}
d := swarm.NewSwarm(ctx, t, testEnv)
defer d.Stop(t)
c := d.NewClientT(t)
defer c.Close()
serviceID := swarm.CreateService(ctx, t, d,
swarm.ServiceWithName("test-ingress-on-firewalld-reload"),
swarm.ServiceWithCommand([]string{"httpd", "-f"}),
swarm.ServiceWithEndpoint(&swarmtypes.EndpointSpec{
Ports: []swarmtypes.PortConfig{
{
Protocol: "tcp",
TargetPort: 80,
PublishedPort: 8080,
PublishMode: swarmtypes.PortConfigPublishModeIngress,
},
},
}),
)
defer func() {
err := c.ServiceRemove(ctx, serviceID)
assert.NilError(t, err)
}()
t.Log("Waiting for the service to start")
poll.WaitOn(t, swarm.RunningTasksCount(ctx, c, serviceID, 1), swarm.ServicePoll)
t.Log("Checking http access to the service")
poll.WaitOn(t, checkHTTP, poll.WithTimeout(30*time.Second))
t.Log("Firewalld reload")
networking.FirewalldReload(t, d)
t.Log("Checking http access to the service")
// It takes a while before this works ...
poll.WaitOn(t, checkHTTP, poll.WithTimeout(30*time.Second))
}

View File

@@ -0,0 +1,32 @@
// FIXME(thaJeztah): remove once we are a module; the go:build directive prevents go from downgrading language version to go1.16:
//go:build go1.23
package iterutil
import (
"iter"
"maps"
)
// SameValues checks if a and b yield the same values, independent of order.
func SameValues[T comparable](a, b iter.Seq[T]) bool {
m, n := make(map[T]int), make(map[T]int)
for v := range a {
m[v]++
}
for v := range b {
n[v]++
}
return maps.Equal(m, n)
}
// Deref adapts an iterator of pointers to an iterator of values.
func Deref[T any, P *T](s iter.Seq[P]) iter.Seq[T] {
return func(yield func(T) bool) {
for p := range s {
if !yield(*p) {
return
}
}
}
}

View File

@@ -0,0 +1,31 @@
// FIXME(thaJeztah): remove once we are a module; the go:build directive prevents go from downgrading language version to go1.16:
//go:build go1.23
package iterutil
import (
"slices"
"testing"
"gotest.tools/v3/assert"
)
func TestSameValues(t *testing.T) {
a := []int{1, 2, 3, 4, 3}
b := []int{3, 4, 3, 2, 1}
c := []int{1, 2, 3, 4}
assert.Check(t, SameValues(slices.Values(a), slices.Values(a)))
assert.Check(t, SameValues(slices.Values(c), slices.Values(c)))
assert.Check(t, SameValues(slices.Values(a), slices.Values(b)))
assert.Check(t, !SameValues(slices.Values(a), slices.Values(c)))
}
func TestDeref(t *testing.T) {
a := make([]*int, 3)
for i := range a {
a[i] = &i
}
b := slices.Collect(Deref(slices.Values(a)))
assert.DeepEqual(t, b, []int{0, 1, 2})
}

View File

@@ -0,0 +1,60 @@
package networking
import (
"fmt"
"os/exec"
"regexp"
"strings"
"testing"
"time"
"github.com/docker/docker/testutil/daemon"
"golang.org/x/net/context"
"gotest.tools/v3/assert"
"gotest.tools/v3/icmd"
"gotest.tools/v3/poll"
)
func FirewalldRunning() bool {
state, err := exec.Command("firewall-cmd", "--state").CombinedOutput()
return err == nil && strings.TrimSpace(string(state)) == "running"
}
func extractLogTime(s string) (time.Time, error) {
// time="2025-07-15T13:46:13.414214418Z" level=info msg=""
re := regexp.MustCompile(`time="([^"]+)"`)
matches := re.FindStringSubmatch(s)
if len(matches) < 2 {
return time.Time{}, fmt.Errorf("timestamp not found in log line: %s, matches: %+v", s, matches)
}
return time.Parse(time.RFC3339Nano, matches[1])
}
// FirewalldReload reloads firewalld and waits for the daemon to re-create its rules.
// It's a no-op if firewalld is not running, and the test fails if the reload does
// not complete.
func FirewalldReload(t *testing.T, d *daemon.Daemon) {
t.Helper()
if !FirewalldRunning() {
return
}
timeBeforeReload := time.Now()
res := icmd.RunCommand("firewall-cmd", "--reload")
assert.NilError(t, res.Error)
ctx := context.Background()
poll.WaitOn(t, d.PollCheckLogs(ctx, func(s string) bool {
if !strings.Contains(s, "Firewalld reload completed") {
return false
}
lastReload, err := extractLogTime(s)
if err != nil {
return false
}
if lastReload.After(timeBeforeReload) {
return true
}
return false
}))
}

View File

@@ -0,0 +1,47 @@
package networking
import (
"reflect"
"testing"
"time"
)
func Test_getTimeFromLogMsg(t *testing.T) {
tests := []struct {
name string
s string
want time.Time
wantErr bool
}{
{
name: "valid time",
s: `time="2025-07-15T13:46:13.414214418Z" level=info msg=""`,
want: time.Date(2025, 7, 15, 13, 46, 13, 414214418, time.UTC),
wantErr: false,
},
{
name: "invalid format",
s: `time="invalid-time-format" level=info msg=""`,
want: time.Time{},
wantErr: true,
},
{
name: "missing time",
s: `level=info msg=""`,
want: time.Time{},
wantErr: true,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got, err := extractLogTime(tt.s)
if (err != nil) != tt.wantErr {
t.Errorf("getTimeFromLogMsg() error = %v, wantErr %v", err, tt.wantErr)
return
}
if !reflect.DeepEqual(got, tt.want) {
t.Errorf("getTimeFromLogMsg() got = %v, want %v", got, tt.want)
}
})
}
}

View File

@@ -1,3 +1,6 @@
// FIXME(thaJeztah): remove once we are a module; the go:build directive prevents go from downgrading language version to go1.16:
//go:build go1.23
package libnetwork
//go:generate protoc -I=. -I=../vendor/ --gogofaster_out=import_path=github.com/docker/docker/libnetwork:. agent.proto
@@ -7,10 +10,13 @@ import (
"encoding/json"
"fmt"
"net"
"net/netip"
"slices"
"sort"
"sync"
"github.com/containerd/log"
"github.com/docker/docker/internal/iterutil"
"github.com/docker/docker/libnetwork/cluster"
"github.com/docker/docker/libnetwork/discoverapi"
"github.com/docker/docker/libnetwork/driverapi"
@@ -490,17 +496,19 @@ func (n *Network) Services() map[string]ServiceInfo {
// Walk through the driver's tables, have the driver decode the entries
// and return the tuple {ep ID, value}. value is a string that coveys
// relevant info about the endpoint.
for _, table := range n.driverTables {
if table.objType != driverapi.EndpointObject {
continue
}
for key, value := range agent.networkDB.GetTableByNetwork(table.name, nwID) {
epID, info := d.DecodeTableEntry(table.name, key, value.Value)
if ep, ok := eps[epID]; !ok {
log.G(context.TODO()).Errorf("Inconsistent driver and libnetwork state for endpoint %s", epID)
} else {
ep.info = info
eps[epID] = ep
if d, ok := d.(driverapi.TableWatcher); ok {
for _, table := range n.driverTables {
if table.objType != driverapi.EndpointObject {
continue
}
for key, value := range agent.networkDB.GetTableByNetwork(table.name, nwID) {
epID, info := d.DecodeTableEntry(table.name, key, value.Value)
if ep, ok := eps[epID]; !ok {
log.G(context.TODO()).Errorf("Inconsistent driver and libnetwork state for endpoint %s", epID)
} else {
ep.info = info
eps[epID] = ep
}
}
}
}
@@ -813,33 +821,14 @@ func (n *Network) handleDriverTableEvent(ev events.Event) {
log.G(context.TODO()).Errorf("Could not resolve driver %s while handling driver table event: %v", n.networkType, err)
return
}
var (
etype driverapi.EventType
tname string
key string
value []byte
)
switch event := ev.(type) {
case networkdb.CreateEvent:
tname = event.Table
key = event.Key
value = event.Value
etype = driverapi.Create
case networkdb.DeleteEvent:
tname = event.Table
key = event.Key
value = event.Value
etype = driverapi.Delete
case networkdb.UpdateEvent:
tname = event.Table
key = event.Key
value = event.Value
etype = driverapi.Update
ed, ok := d.(driverapi.TableWatcher)
if !ok {
log.G(context.TODO()).Errorf("Could not notify driver %s about table event: driver does not implement TableWatcher interface", n.networkType)
return
}
d.EventNotify(etype, n.ID(), tname, key, value)
event := ev.(networkdb.WatchEvent)
ed.EventNotify(n.ID(), event.Table, event.Key, event.Prev, event.Value)
}
func (c *Controller) handleNodeTableEvent(ev events.Event) {
@@ -848,13 +837,14 @@ func (c *Controller) handleNodeTableEvent(ev events.Event) {
isAdd bool
nodeAddr networkdb.NodeAddr
)
switch event := ev.(type) {
case networkdb.CreateEvent:
event := ev.(networkdb.WatchEvent)
switch {
case event.IsCreate():
value = event.Value
isAdd = true
case networkdb.DeleteEvent:
value = event.Value
case networkdb.UpdateEvent:
case event.IsDelete():
value = event.Prev
case event.IsUpdate():
log.G(context.TODO()).Errorf("Unexpected update node table event = %#v", event)
}
@@ -866,95 +856,139 @@ func (c *Controller) handleNodeTableEvent(ev events.Event) {
c.processNodeDiscovery([]net.IP{nodeAddr.Addr}, isAdd)
}
type endpointEvent struct {
EndpointRecord
// Virtual IP of the service to which this endpoint belongs.
VirtualIP netip.Addr
// IP assigned to this endpoint.
EndpointIP netip.Addr
}
func unmarshalEndpointRecord(data []byte) (*endpointEvent, error) {
var epRec EndpointRecord
if err := proto.Unmarshal(data, &epRec); err != nil {
return nil, fmt.Errorf("failed to unmarshal endpoint record: %w", err)
}
vip, _ := netip.ParseAddr(epRec.VirtualIP)
eip, _ := netip.ParseAddr(epRec.EndpointIP)
if epRec.Name == "" || !eip.IsValid() {
return nil, fmt.Errorf("invalid endpoint name/ip in service table event %s", data)
}
return &endpointEvent{
EndpointRecord: epRec,
VirtualIP: vip,
EndpointIP: eip,
}, nil
}
// EquivalentTo returns true if ev is semantically equivalent to other.
func (ev *endpointEvent) EquivalentTo(other *endpointEvent) bool {
return ev.Name == other.Name &&
ev.ServiceName == other.ServiceName &&
ev.ServiceID == other.ServiceID &&
ev.VirtualIP == other.VirtualIP &&
ev.EndpointIP == other.EndpointIP &&
ev.ServiceDisabled == other.ServiceDisabled &&
iterutil.SameValues(
iterutil.Deref(slices.Values(ev.IngressPorts)),
iterutil.Deref(slices.Values(other.IngressPorts))) &&
iterutil.SameValues(slices.Values(ev.Aliases), slices.Values(other.Aliases)) &&
iterutil.SameValues(slices.Values(ev.TaskAliases), slices.Values(other.TaskAliases))
}
func (c *Controller) handleEpTableEvent(ev events.Event) {
var (
nid string
eid string
value []byte
epRec EndpointRecord
)
event := ev.(networkdb.WatchEvent)
nid := event.NetworkID
eid := event.Key
switch event := ev.(type) {
case networkdb.CreateEvent:
nid = event.NetworkID
eid = event.Key
value = event.Value
case networkdb.DeleteEvent:
nid = event.NetworkID
eid = event.Key
value = event.Value
case networkdb.UpdateEvent:
nid = event.NetworkID
eid = event.Key
value = event.Value
default:
log.G(context.TODO()).Errorf("Unexpected update service table event = %#v", event)
return
var prev, epRec *endpointEvent
if event.Prev != nil {
var err error
prev, err = unmarshalEndpointRecord(event.Prev)
if err != nil {
log.G(context.TODO()).WithError(err).Error("error unmarshaling previous value from service table event")
return
}
}
err := proto.Unmarshal(value, &epRec)
if err != nil {
log.G(context.TODO()).WithError(err).Error("Failed to unmarshal service table value")
return
if event.Value != nil {
var err error
epRec, err = unmarshalEndpointRecord(event.Value)
if err != nil {
log.G(context.TODO()).WithError(err).Error("error unmarshaling service table event")
return
}
}
containerName := epRec.Name
svcName := epRec.ServiceName
svcID := epRec.ServiceID
vip := net.ParseIP(epRec.VirtualIP)
ip := net.ParseIP(epRec.EndpointIP)
ingressPorts := epRec.IngressPorts
serviceAliases := epRec.Aliases
taskAliases := epRec.TaskAliases
logger := log.G(context.TODO()).WithFields(log.Fields{
"nid": nid,
"eid": eid,
"T": fmt.Sprintf("%T", ev),
"R": epRec,
"evt": event,
"R": epRec,
"prev": prev,
})
if containerName == "" || ip == nil {
logger.Errorf("Invalid endpoint name/ip received while handling service table event %s", value)
return
}
logger.Debug("handleEpTableEvent")
switch ev.(type) {
case networkdb.CreateEvent, networkdb.UpdateEvent:
if svcID != "" {
if prev != nil {
if epRec != nil && prev.EquivalentTo(epRec) {
// Avoid flapping if we would otherwise remove a service
// binding then immediately replace it with an equivalent one.
return
}
if prev.ServiceID != "" {
// This is a remote task part of a service
if !prev.ServiceDisabled {
err := c.rmServiceBinding(prev.ServiceName, prev.ServiceID, nid, eid,
prev.Name, prev.VirtualIP.AsSlice(), prev.IngressPorts,
prev.Aliases, prev.TaskAliases, prev.EndpointIP.AsSlice(),
"handleEpTableEvent", true, true)
if err != nil {
logger.WithError(err).Error("failed removing service binding")
}
}
} else {
// This is a remote container simply attached to an attachable network
err := c.delContainerNameResolution(nid, eid, prev.Name, prev.TaskAliases,
prev.EndpointIP.AsSlice(), "handleEpTableEvent")
if err != nil {
logger.WithError(err).Errorf("failed removing container name resolution")
}
}
}
if epRec != nil {
if epRec.ServiceID != "" {
// This is a remote task part of a service
if epRec.ServiceDisabled {
if err := c.rmServiceBinding(svcName, svcID, nid, eid, containerName, vip, ingressPorts, serviceAliases, taskAliases, ip, "handleEpTableEvent", true, false); err != nil {
logger.WithError(err).Error("failed disabling service binding")
return
// Don't double-remove a service binding
if prev == nil || prev.ServiceID != epRec.ServiceID || !prev.ServiceDisabled {
err := c.rmServiceBinding(epRec.ServiceName, epRec.ServiceID,
nid, eid, epRec.Name, epRec.VirtualIP.AsSlice(),
epRec.IngressPorts, epRec.Aliases, epRec.TaskAliases,
epRec.EndpointIP.AsSlice(), "handleEpTableEvent", true, false)
if err != nil {
logger.WithError(err).Error("failed disabling service binding")
return
}
}
} else {
if err := c.addServiceBinding(svcName, svcID, nid, eid, containerName, vip, ingressPorts, serviceAliases, taskAliases, ip, "handleEpTableEvent"); err != nil {
err := c.addServiceBinding(epRec.ServiceName, epRec.ServiceID, nid, eid,
epRec.Name, epRec.VirtualIP.AsSlice(), epRec.IngressPorts,
epRec.Aliases, epRec.TaskAliases, epRec.EndpointIP.AsSlice(),
"handleEpTableEvent")
if err != nil {
logger.WithError(err).Error("failed adding service binding")
return
}
}
} else {
// This is a remote container simply attached to an attachable network
if err := c.addContainerNameResolution(nid, eid, containerName, taskAliases, ip, "handleEpTableEvent"); err != nil {
err := c.addContainerNameResolution(nid, eid, epRec.Name, epRec.TaskAliases,
epRec.EndpointIP.AsSlice(), "handleEpTableEvent")
if err != nil {
logger.WithError(err).Errorf("failed adding container name resolution")
}
}
case networkdb.DeleteEvent:
if svcID != "" {
// This is a remote task part of a service
if err := c.rmServiceBinding(svcName, svcID, nid, eid, containerName, vip, ingressPorts, serviceAliases, taskAliases, ip, "handleEpTableEvent", true, true); err != nil {
logger.WithError(err).Error("failed removing service binding")
return
}
} else {
// This is a remote container simply attached to an attachable network
if err := c.delContainerNameResolution(nid, eid, containerName, taskAliases, ip, "handleEpTableEvent"); err != nil {
logger.WithError(err).Errorf("failed removing container name resolution")
}
}
}
}

93
libnetwork/agent_test.go Normal file
View File

@@ -0,0 +1,93 @@
// FIXME(thaJeztah): remove once we are a module; the go:build directive prevents go from downgrading language version to go1.16:
//go:build go1.23
package libnetwork
import (
"net/netip"
"slices"
"testing"
"gotest.tools/v3/assert"
)
func TestEndpointEvent_EquivalentTo(t *testing.T) {
assert.Check(t, (&endpointEvent{}).EquivalentTo(&endpointEvent{}))
a := endpointEvent{
EndpointRecord: EndpointRecord{
Name: "foo",
ServiceName: "bar",
ServiceID: "baz",
IngressPorts: []*PortConfig{
{
Protocol: ProtocolTCP,
TargetPort: 80,
},
{
Name: "dns",
Protocol: ProtocolUDP,
TargetPort: 5353,
PublishedPort: 53,
},
},
},
VirtualIP: netip.MustParseAddr("10.0.0.42"),
EndpointIP: netip.MustParseAddr("192.168.69.42"),
}
assert.Check(t, a.EquivalentTo(&a))
reflexiveEquiv := func(a, b *endpointEvent) bool {
t.Helper()
assert.Check(t, a.EquivalentTo(b) == b.EquivalentTo(a), "reflexive equivalence")
return a.EquivalentTo(b)
}
b := a
b.ServiceDisabled = true
assert.Check(t, !reflexiveEquiv(&a, &b), "differing by ServiceDisabled")
c := a
c.IngressPorts = slices.Clone(a.IngressPorts)
slices.Reverse(c.IngressPorts)
assert.Check(t, reflexiveEquiv(&a, &c), "IngressPorts order should not matter")
d := a
d.IngressPorts = append(d.IngressPorts, a.IngressPorts[0])
assert.Check(t, !reflexiveEquiv(&a, &d), "Differing number of copies of IngressPort entries should not be equivalent")
d.IngressPorts = a.IngressPorts[:1]
assert.Check(t, !reflexiveEquiv(&a, &d), "Removing an IngressPort entry should not be equivalent")
e := a
e.Aliases = []string{"alias1", "alias2"}
assert.Check(t, !reflexiveEquiv(&a, &e), "Differing Aliases should not be equivalent")
f := a
f.TaskAliases = []string{"taskalias1", "taskalias2"}
assert.Check(t, !reflexiveEquiv(&a, &f), "Adding TaskAliases should not be equivalent")
g := a
g.TaskAliases = []string{"taskalias2", "taskalias1"}
assert.Check(t, reflexiveEquiv(&f, &g), "TaskAliases order should not matter")
g.TaskAliases = g.TaskAliases[:1]
assert.Check(t, !reflexiveEquiv(&f, &g), "Differing number of TaskAliases should not be equivalent")
h := a
h.EndpointIP = netip.MustParseAddr("192.168.69.43")
assert.Check(t, !reflexiveEquiv(&a, &h), "Differing EndpointIP should not be equivalent")
i := a
i.VirtualIP = netip.MustParseAddr("10.0.0.69")
assert.Check(t, !reflexiveEquiv(&a, &i), "Differing VirtualIP should not be equivalent")
j := a
j.ServiceID = "qux"
assert.Check(t, !reflexiveEquiv(&a, &j), "Differing ServiceID should not be equivalent")
k := a
k.ServiceName = "quux"
assert.Check(t, !reflexiveEquiv(&a, &k), "Differing ServiceName should not be equivalent")
l := a
l.Name = "aaaaa"
assert.Check(t, !reflexiveEquiv(&a, &l), "Differing Name should not be equivalent")
}

View File

@@ -232,7 +232,7 @@ func (h *Bitmap) IsSet(ordinal uint64) bool {
}
// set/reset the bit
func (h *Bitmap) set(ordinal, start, end uint64, any bool, release bool, serial bool) (uint64, error) {
func (h *Bitmap) set(ordinal, start, end uint64, isAvailable bool, release bool, serial bool) (uint64, error) {
var (
bitPos uint64
bytePos uint64
@@ -248,7 +248,7 @@ func (h *Bitmap) set(ordinal, start, end uint64, any bool, release bool, serial
if release {
bytePos, bitPos = ordinalToPos(ordinal)
} else {
if any {
if isAvailable {
bytePos, bitPos, err = getAvailableFromCurrent(h.head, start, curr, end)
ret = posToOrdinal(bytePos, bitPos)
if err == nil {

View File

@@ -80,13 +80,6 @@ func watchTableEntries(w http.ResponseWriter, r *http.Request) {
}
func handleTableEvents(tableName string, ch *events.Channel) {
var (
// nid string
eid string
value []byte
isAdd bool
)
log.G(context.TODO()).Infof("Started watching table:%s", tableName)
for {
select {
@@ -95,27 +88,17 @@ func handleTableEvents(tableName string, ch *events.Channel) {
return
case evt := <-ch.C:
log.G(context.TODO()).Infof("Recevied new event on:%s", tableName)
switch event := evt.(type) {
case networkdb.CreateEvent:
// nid = event.NetworkID
eid = event.Key
value = event.Value
isAdd = true
case networkdb.DeleteEvent:
// nid = event.NetworkID
eid = event.Key
value = event.Value
isAdd = false
default:
log.G(context.TODO()).Infof("Received new event on:%s", tableName)
event, ok := evt.(networkdb.WatchEvent)
if !ok {
log.G(context.TODO()).Fatalf("Unexpected table event = %#v", event)
}
if isAdd {
// log.G(ctx).Infof("Add %s %s", tableName, eid)
clientWatchTable[tableName].entries[eid] = string(value)
if event.Value != nil {
// log.G(ctx).Infof("Add %s %s", tableName, event.Key)
clientWatchTable[tableName].entries[event.Key] = string(event.Value)
} else {
// log.G(ctx).Infof("Del %s %s", tableName, eid)
delete(clientWatchTable[tableName].entries, eid)
// log.G(ctx).Infof("Del %s %s", tableName, event.Key)
delete(clientWatchTable[tableName].entries, event.Key)
}
}
}

View File

@@ -30,13 +30,6 @@ func (d *manager) CreateNetwork(id string, option map[string]interface{}, nInfo
return types.NotImplementedErrorf("not implemented")
}
func (d *manager) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
}
func (d *manager) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
return "", nil
}
func (d *manager) DeleteNetwork(nid string) error {
return types.NotImplementedErrorf("not implemented")
}

View File

@@ -163,7 +163,7 @@ func New(cfgOptions ...config.Option) (*Controller, error) {
return nil, err
}
setupArrangeUserFilterRule(c)
c.setupPlatformFirewall()
return c, nil
}

View File

@@ -59,11 +59,20 @@ type Driver interface {
// programming that was done so far
RevokeExternalConnectivity(nid, eid string) error
// Type returns the type of this driver, the network type this driver manages
Type() string
// IsBuiltIn returns true if it is a built-in driver
IsBuiltIn() bool
}
// TableWatcher is an optional interface for a network driver.
type TableWatcher interface {
// EventNotify notifies the driver when a CRUD operation has
// happened on a table of its interest as soon as this node
// receives such an event in the gossip layer. This method is
// only invoked for the global scope driver.
EventNotify(event EventType, nid string, tableName string, key string, value []byte)
EventNotify(nid string, tableName string, key string, prev, value []byte)
// DecodeTableEntry passes the driver a key, value pair from table it registered
// with libnetwork. Driver should return {object ID, map[string]string} tuple.
@@ -74,12 +83,6 @@ type Driver interface {
// For example: overlay driver returns the VTEP IP of the host that has the endpoint
// which is shown in 'network inspect --verbose'
DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string)
// Type returns the type of this driver, the network type this driver manages
Type() string
// IsBuiltIn returns true if it is a built-in driver
IsBuiltIn() bool
}
// NetworkInfo provides a go interface for drivers to provide network
@@ -170,18 +173,6 @@ type IPAMData struct {
AuxAddresses map[string]*net.IPNet
}
// EventType defines a type for the CRUD event
type EventType uint8
const (
// Create event is generated when a table entry is created,
Create EventType = 1 + iota
// Update event is generated when a table entry is updated.
Update
// Delete event is generated when a table entry is deleted.
Delete
)
// ObjectType represents the type of object driver wants to store in libnetwork's networkDB
type ObjectType int

View File

@@ -213,32 +213,32 @@ func ValidateFixedCIDRV6(val string) error {
// Validate performs a static validation on the network configuration parameters.
// Whatever can be assessed a priori before attempting any programming.
func (c *networkConfiguration) Validate() error {
if c.Mtu < 0 {
return ErrInvalidMtu(c.Mtu)
func (ncfg *networkConfiguration) Validate() error {
if ncfg.Mtu < 0 {
return ErrInvalidMtu(ncfg.Mtu)
}
// If bridge v4 subnet is specified
if c.AddressIPv4 != nil {
if ncfg.AddressIPv4 != nil {
// If default gw is specified, it must be part of bridge subnet
if c.DefaultGatewayIPv4 != nil {
if !c.AddressIPv4.Contains(c.DefaultGatewayIPv4) {
if ncfg.DefaultGatewayIPv4 != nil {
if !ncfg.AddressIPv4.Contains(ncfg.DefaultGatewayIPv4) {
return &ErrInvalidGateway{}
}
}
}
if c.EnableIPv6 {
if ncfg.EnableIPv6 {
// If IPv6 is enabled, AddressIPv6 must have been configured.
if c.AddressIPv6 == nil {
if ncfg.AddressIPv6 == nil {
return errdefs.System(errors.New("no IPv6 address was allocated for the bridge"))
}
// AddressIPv6 must be IPv6, and not overlap with the LL subnet prefix.
if err := validateIPv6Subnet(c.AddressIPv6); err != nil {
if err := validateIPv6Subnet(ncfg.AddressIPv6); err != nil {
return err
}
// If a default gw is specified, it must belong to AddressIPv6's subnet
if c.DefaultGatewayIPv6 != nil && !c.AddressIPv6.Contains(c.DefaultGatewayIPv6) {
if ncfg.DefaultGatewayIPv6 != nil && !ncfg.AddressIPv6.Contains(ncfg.DefaultGatewayIPv6) {
return &ErrInvalidGateway{}
}
}
@@ -247,73 +247,73 @@ func (c *networkConfiguration) Validate() error {
}
// Conflicts check if two NetworkConfiguration objects overlap
func (c *networkConfiguration) Conflicts(o *networkConfiguration) error {
func (ncfg *networkConfiguration) Conflicts(o *networkConfiguration) error {
if o == nil {
return errors.New("same configuration")
}
// Also empty, because only one network with empty name is allowed
if c.BridgeName == o.BridgeName {
if ncfg.BridgeName == o.BridgeName {
return errors.New("networks have same bridge name")
}
// They must be in different subnets
if (c.AddressIPv4 != nil && o.AddressIPv4 != nil) &&
(c.AddressIPv4.Contains(o.AddressIPv4.IP) || o.AddressIPv4.Contains(c.AddressIPv4.IP)) {
if (ncfg.AddressIPv4 != nil && o.AddressIPv4 != nil) &&
(ncfg.AddressIPv4.Contains(o.AddressIPv4.IP) || o.AddressIPv4.Contains(ncfg.AddressIPv4.IP)) {
return errors.New("networks have overlapping IPv4")
}
// They must be in different v6 subnets
if (c.AddressIPv6 != nil && o.AddressIPv6 != nil) &&
(c.AddressIPv6.Contains(o.AddressIPv6.IP) || o.AddressIPv6.Contains(c.AddressIPv6.IP)) {
if (ncfg.AddressIPv6 != nil && o.AddressIPv6 != nil) &&
(ncfg.AddressIPv6.Contains(o.AddressIPv6.IP) || o.AddressIPv6.Contains(ncfg.AddressIPv6.IP)) {
return errors.New("networks have overlapping IPv6")
}
return nil
}
func (c *networkConfiguration) fromLabels(labels map[string]string) error {
func (ncfg *networkConfiguration) fromLabels(labels map[string]string) error {
var err error
for label, value := range labels {
switch label {
case BridgeName:
c.BridgeName = value
ncfg.BridgeName = value
case netlabel.DriverMTU:
if c.Mtu, err = strconv.Atoi(value); err != nil {
if ncfg.Mtu, err = strconv.Atoi(value); err != nil {
return parseErr(label, value, err.Error())
}
case netlabel.EnableIPv6:
if c.EnableIPv6, err = strconv.ParseBool(value); err != nil {
if ncfg.EnableIPv6, err = strconv.ParseBool(value); err != nil {
return parseErr(label, value, err.Error())
}
case EnableIPMasquerade:
if c.EnableIPMasquerade, err = strconv.ParseBool(value); err != nil {
if ncfg.EnableIPMasquerade, err = strconv.ParseBool(value); err != nil {
return parseErr(label, value, err.Error())
}
case EnableICC:
if c.EnableICC, err = strconv.ParseBool(value); err != nil {
if ncfg.EnableICC, err = strconv.ParseBool(value); err != nil {
return parseErr(label, value, err.Error())
}
case InhibitIPv4:
if c.InhibitIPv4, err = strconv.ParseBool(value); err != nil {
if ncfg.InhibitIPv4, err = strconv.ParseBool(value); err != nil {
return parseErr(label, value, err.Error())
}
case DefaultBridge:
if c.DefaultBridge, err = strconv.ParseBool(value); err != nil {
if ncfg.DefaultBridge, err = strconv.ParseBool(value); err != nil {
return parseErr(label, value, err.Error())
}
case DefaultBindingIP:
if c.DefaultBindingIP = net.ParseIP(value); c.DefaultBindingIP == nil {
if ncfg.DefaultBindingIP = net.ParseIP(value); ncfg.DefaultBindingIP == nil {
return parseErr(label, value, "nil ip")
}
case netlabel.ContainerIfacePrefix:
c.ContainerIfacePrefix = value
ncfg.ContainerIfacePrefix = value
case netlabel.HostIPv4:
if c.HostIPv4 = net.ParseIP(value); c.HostIPv4 == nil {
if ncfg.HostIPv4 = net.ParseIP(value); ncfg.HostIPv4 == nil {
return parseErr(label, value, "nil ip")
}
case netlabel.HostIPv6:
if c.HostIPv6 = net.ParseIP(value); c.HostIPv6 == nil {
if ncfg.HostIPv6 = net.ParseIP(value); ncfg.HostIPv6 == nil {
return parseErr(label, value, "nil ip")
}
}
@@ -483,6 +483,8 @@ func (d *driver) configure(option map[string]interface{}) error {
d.config = config
d.Unlock()
iptables.OnReloaded(d.handleFirewalldReload)
return d.initStore(option)
}
@@ -528,7 +530,7 @@ func parseNetworkGenericOptions(data interface{}) (*networkConfiguration, error)
return config, err
}
func (c *networkConfiguration) processIPAM(id string, ipamV4Data, ipamV6Data []driverapi.IPAMData) error {
func (ncfg *networkConfiguration) processIPAM(id string, ipamV4Data, ipamV6Data []driverapi.IPAMData) error {
if len(ipamV4Data) > 1 || len(ipamV6Data) > 1 {
return types.ForbiddenErrorf("bridge driver doesn't support multiple subnets")
}
@@ -538,22 +540,22 @@ func (c *networkConfiguration) processIPAM(id string, ipamV4Data, ipamV6Data []d
}
if ipamV4Data[0].Gateway != nil {
c.AddressIPv4 = types.GetIPNetCopy(ipamV4Data[0].Gateway)
ncfg.AddressIPv4 = types.GetIPNetCopy(ipamV4Data[0].Gateway)
}
if gw, ok := ipamV4Data[0].AuxAddresses[DefaultGatewayV4AuxKey]; ok {
c.DefaultGatewayIPv4 = gw.IP
ncfg.DefaultGatewayIPv4 = gw.IP
}
if len(ipamV6Data) > 0 {
c.AddressIPv6 = ipamV6Data[0].Pool
ncfg.AddressIPv6 = ipamV6Data[0].Pool
if ipamV6Data[0].Gateway != nil {
c.AddressIPv6 = types.GetIPNetCopy(ipamV6Data[0].Gateway)
ncfg.AddressIPv6 = types.GetIPNetCopy(ipamV6Data[0].Gateway)
}
if gw, ok := ipamV6Data[0].AuxAddresses[DefaultGatewayV6AuxKey]; ok {
c.DefaultGatewayIPv6 = gw.IP
ncfg.DefaultGatewayIPv6 = gw.IP
}
}
@@ -623,13 +625,6 @@ func (d *driver) NetworkFree(id string) error {
return types.NotImplementedErrorf("not implemented")
}
func (d *driver) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
}
func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
return "", nil
}
// Create a new network using bridge plugin
func (d *driver) CreateNetwork(id string, option map[string]interface{}, nInfo driverapi.NetworkInfo, ipV4Data, ipV6Data []driverapi.IPAMData) error {
if len(ipV4Data) == 0 || ipV4Data[0].Pool.String() == "0.0.0.0/0" {
@@ -800,12 +795,6 @@ func (d *driver) createNetwork(config *networkConfiguration) (err error) {
// Setup IP6Tables.
{config.EnableIPv6 && d.config.EnableIP6Tables, network.setupIP6Tables},
// We want to track firewalld configuration so that
// if it is started/reloaded, the rules can be applied correctly
{d.config.EnableIPTables, network.setupFirewalld},
// same for IPv6
{config.EnableIPv6 && d.config.EnableIP6Tables, network.setupFirewalld6},
// Setup DefaultGatewayIPv4
{config.DefaultGatewayIPv4 != nil, setupGatewayIPv4},
@@ -1287,16 +1276,15 @@ func (d *driver) Leave(nid, eid string) error {
return EndpointNotFoundError(eid)
}
if !network.config.EnableICC {
if err = d.link(network, endpoint, false); err != nil {
return err
}
}
return nil
}
func (d *driver) ProgramExternalConnectivity(nid, eid string, options map[string]interface{}) error {
// Make sure the network isn't deleted, or the in middle of a firewalld reload, while
// updating its iptables rules.
d.configNetwork.Lock()
defer d.configNetwork.Unlock()
network, err := d.getNetwork(nid)
if err != nil {
return err
@@ -1348,6 +1336,11 @@ func (d *driver) ProgramExternalConnectivity(nid, eid string, options map[string
}
func (d *driver) RevokeExternalConnectivity(nid, eid string) error {
// Make sure this function isn't deleting iptables rules while handleFirewalldReloadNw
// is restoring those same rules.
d.configNetwork.Lock()
defer d.configNetwork.Unlock()
network, err := d.getNetwork(nid)
if err != nil {
return err
@@ -1378,9 +1371,88 @@ func (d *driver) RevokeExternalConnectivity(nid, eid string) error {
return fmt.Errorf("failed to update bridge endpoint %.7s to store: %v", endpoint.id, err)
}
if !network.config.EnableICC {
if err = d.link(network, endpoint, false); err != nil {
return err
}
}
return nil
}
func (d *driver) handleFirewalldReload() {
if !d.config.EnableIPTables && !d.config.EnableIP6Tables {
return
}
d.Lock()
nids := make([]string, 0, len(d.networks))
for _, nw := range d.networks {
nids = append(nids, nw.id)
}
d.Unlock()
for _, nid := range nids {
d.handleFirewalldReloadNw(nid)
}
}
func (d *driver) handleFirewalldReloadNw(nid string) {
// Make sure the network isn't being deleted, and ProgramExternalConnectivity/RevokeExternalConnectivity
// aren't modifying iptables rules, while restoring the rules.
d.configNetwork.Lock()
defer d.configNetwork.Unlock()
nw, err := d.getNetwork(nid)
if err != nil {
return
}
if d.config.EnableIPTables {
if err := nw.setupIP4Tables(nw.config, nw.bridge); err != nil {
log.G(context.TODO()).WithFields(log.Fields{
"network": nw.id,
"error": err,
}).Warn("Failed to restore IPv4 per-port iptables rules on firewalld reload")
}
}
if d.config.EnableIP6Tables {
if err := nw.setupIP6Tables(nw.config, nw.bridge); err != nil {
log.G(context.TODO()).WithFields(log.Fields{
"network": nw.id,
"error": err,
}).Warn("Failed to restore IPv6 per-port iptables rules on firewalld reload")
}
}
nw.portMapper.ReMapAll()
// Restore the inter-network connectivity (INC) rules.
if err := nw.isolateNetwork(true); err != nil {
log.G(context.TODO()).WithFields(log.Fields{
"network": nw.id,
"error": err,
}).Warn("Failed to restore inter-network iptables rules on firewalld reload")
}
// Re-add legacy links - only added during ProgramExternalConnectivity, but legacy
// links are default-bridge-only, and it's not possible to connect a container to
// the default bridge and a user-defined network. So, the default bridge is always
// the gateway and, if there are legacy links configured they need to be set up.
if !nw.config.EnableICC {
nw.Lock()
defer nw.Unlock()
for _, ep := range nw.endpoints {
if err := d.link(nw, ep, true); err != nil {
log.G(context.Background()).WithFields(log.Fields{
"nid": nw.id,
"eid": ep.id,
"error": err,
}).Error("Failed to re-create link on firewalld reload")
}
}
}
log.G(context.TODO()).Info("Restored iptables rules on firewalld reload")
}
func (d *driver) link(network *bridgeNetwork, endpoint *bridgeEndpoint, enable bool) (retErr error) {
cc := endpoint.containerConfig
ec := endpoint.extConnConfig

View File

@@ -30,13 +30,6 @@ func (d *driver) CreateNetwork(id string, option map[string]interface{}, nInfo d
return types.NotImplementedErrorf("not implemented")
}
func (d *driver) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
}
func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
return "", nil
}
func (d *driver) DeleteNetwork(nid string) error {
return types.NotImplementedErrorf("not implemented")
}

View File

@@ -43,12 +43,7 @@ func (l *link) Enable() error {
linkFunction := func() error {
return linkContainers(iptables.Append, l.parentIP, l.childIP, l.ports, l.bridge, false)
}
if err := linkFunction(); err != nil {
return err
}
iptables.OnReloaded(func() { _ = linkFunction() })
return nil
return linkFunction()
}
func (l *link) Disable() {

View File

@@ -1,41 +0,0 @@
//go:build linux
package bridge
import (
"errors"
"github.com/docker/docker/libnetwork/iptables"
)
func (n *bridgeNetwork) setupFirewalld(config *networkConfiguration, i *bridgeInterface) error {
d := n.driver
d.Lock()
driverConfig := d.config
d.Unlock()
// Sanity check.
if !driverConfig.EnableIPTables {
return errors.New("no need to register firewalld hooks, iptables is disabled")
}
iptables.OnReloaded(func() { n.setupIP4Tables(config, i) })
iptables.OnReloaded(n.portMapper.ReMapAll)
return nil
}
func (n *bridgeNetwork) setupFirewalld6(config *networkConfiguration, i *bridgeInterface) error {
d := n.driver
d.Lock()
driverConfig := d.config
d.Unlock()
// Sanity check.
if !driverConfig.EnableIP6Tables {
return errors.New("no need to register firewalld hooks, ip6tables is disabled")
}
iptables.OnReloaded(func() { n.setupIP6Tables(config, i) })
iptables.OnReloaded(n.portMapperV6.ReMapAll)
return nil
}

View File

@@ -90,11 +90,11 @@ func setupIPChains(config configuration, version iptables.IPVersion) (natChain *
}
}()
if err := iptable.AddReturnRule(IsolationChain1); err != nil {
if err := iptable.AddReturnRule(iptables.Filter, IsolationChain1); err != nil {
return nil, nil, nil, nil, err
}
if err := iptable.AddReturnRule(IsolationChain2); err != nil {
if err := iptable.AddReturnRule(iptables.Filter, IsolationChain2); err != nil {
return nil, nil, nil, nil, err
}
@@ -130,6 +130,9 @@ func (n *bridgeNetwork) setupIP6Tables(config *networkConfiguration, i *bridgeIn
return errors.New("Cannot program chains, EnableIP6Tables is disabled")
}
if i.bridgeIPv6 == nil {
return nil
}
maskedAddrv6 := &net.IPNet{
IP: i.bridgeIPv6.IP.Mask(i.bridgeIPv6.Mask),
Mask: i.bridgeIPv6.Mask,
@@ -192,7 +195,7 @@ func (n *bridgeNetwork) setupIPTables(ipVersion iptables.IPVersion, maskedAddr *
}
d.Lock()
err = iptable.EnsureJumpRule("FORWARD", IsolationChain1)
err = iptable.EnsureJumpRule(iptables.Filter, "FORWARD", IsolationChain1)
d.Unlock()
return err
}

View File

@@ -30,13 +30,6 @@ func (d *driver) NetworkFree(id string) error {
return types.NotImplementedErrorf("not implemented")
}
func (d *driver) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
}
func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
return "", nil
}
func (d *driver) CreateNetwork(id string, option map[string]interface{}, nInfo driverapi.NetworkInfo, ipV4Data, ipV6Data []driverapi.IPAMData) error {
d.Lock()
defer d.Unlock()

View File

@@ -102,10 +102,3 @@ func (d *driver) ProgramExternalConnectivity(nid, eid string, options map[string
func (d *driver) RevokeExternalConnectivity(nid, eid string) error {
return nil
}
func (d *driver) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
}
func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
return "", nil
}

View File

@@ -15,7 +15,7 @@ type driverTester struct {
d *driver
}
func (dt *driverTester) RegisterDriver(name string, drv driverapi.Driver, cap driverapi.Capability) error {
func (dt *driverTester) RegisterDriver(name string, drv driverapi.Driver, capability driverapi.Capability) error {
if name != testNetworkType {
dt.t.Fatalf("Expected driver register name to be %q. Instead got %q",
testNetworkType, name)

View File

@@ -30,13 +30,6 @@ func (d *driver) CreateNetwork(id string, option map[string]interface{}, nInfo d
return types.NotImplementedErrorf("not implemented")
}
func (d *driver) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
}
func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
return "", nil
}
func (d *driver) DeleteNetwork(nid string) error {
return types.NotImplementedErrorf("not implemented")
}

View File

@@ -96,10 +96,3 @@ func (d *driver) ProgramExternalConnectivity(nid, eid string, options map[string
func (d *driver) RevokeExternalConnectivity(nid, eid string) error {
return nil
}
func (d *driver) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
}
func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
return "", nil
}

View File

@@ -15,7 +15,7 @@ type driverTester struct {
d *driver
}
func (dt *driverTester) RegisterDriver(name string, drv driverapi.Driver, cap driverapi.Capability) error {
func (dt *driverTester) RegisterDriver(name string, drv driverapi.Driver, capability driverapi.Capability) error {
if name != testNetworkType {
dt.t.Fatalf("Expected driver register name to be %q. Instead got %q",
testNetworkType, name)

View File

@@ -30,13 +30,6 @@ func (d *driver) CreateNetwork(id string, option map[string]interface{}, nInfo d
return types.NotImplementedErrorf("not implemented")
}
func (d *driver) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
}
func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
return "", nil
}
func (d *driver) DeleteNetwork(nid string) error {
return types.NotImplementedErrorf("not implemented")
}

View File

@@ -30,13 +30,6 @@ func (d *driver) NetworkFree(id string) error {
return types.NotImplementedErrorf("not implemented")
}
func (d *driver) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
}
func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
return "", nil
}
func (d *driver) CreateNetwork(id string, option map[string]interface{}, nInfo driverapi.NetworkInfo, ipV4Data, ipV6Data []driverapi.IPAMData) error {
d.Lock()
defer d.Unlock()

View File

@@ -10,8 +10,8 @@ import (
"fmt"
"hash/fnv"
"net"
"net/netip"
"strconv"
"sync"
"syscall"
"github.com/containerd/log"
@@ -89,21 +89,24 @@ func (s *spi) String() string {
return fmt.Sprintf("SPI(FWD: 0x%x, REV: 0x%x)", uint32(s.forward), uint32(s.reverse))
}
type encrMap struct {
nodes map[string][]*spi
sync.Mutex
type encrNode struct {
spi []spi
count int
}
func (e *encrMap) String() string {
e.Lock()
defer e.Unlock()
// encrMap is a map of node IP addresses to their encryption parameters.
//
// Like all Go maps, it is not safe for concurrent use.
type encrMap map[netip.Addr]encrNode
func (e encrMap) String() string {
b := new(bytes.Buffer)
for k, v := range e.nodes {
for k, v := range e {
b.WriteString("\n")
b.WriteString(k)
b.WriteString(k.String())
b.WriteString(":")
b.WriteString("[")
for _, s := range v {
for _, s := range v.spi {
b.WriteString(s.String())
b.WriteString(",")
}
@@ -112,72 +115,30 @@ func (e *encrMap) String() string {
return b.String()
}
func (d *driver) checkEncryption(nid string, rIP net.IP, isLocal, add bool) error {
log.G(context.TODO()).Debugf("checkEncryption(%.7s, %v, %t)", nid, rIP, isLocal)
n := d.network(nid)
if n == nil || !n.secure {
return nil
}
// setupEncryption programs the encryption parameters for secure communication
// between the local node and a remote node.
func (d *driver) setupEncryption(remoteIP netip.Addr) error {
log.G(context.TODO()).Debugf("setupEncryption(%s)", remoteIP)
d.encrMu.Lock()
defer d.encrMu.Unlock()
if len(d.keys) == 0 {
return types.ForbiddenErrorf("encryption key is not present")
}
lIP := d.bindAddress
aIP := d.advertiseAddress
nodes := map[string]net.IP{}
switch {
case isLocal:
if err := d.peerDbNetworkWalk(nid, func(pKey *peerKey, pEntry *peerEntry) bool {
if !aIP.Equal(pEntry.vtep) {
nodes[pEntry.vtep.String()] = pEntry.vtep
}
return false
}); err != nil {
log.G(context.TODO()).Warnf("Failed to retrieve list of participating nodes in overlay network %.5s: %v", nid, err)
}
default:
if len(d.network(nid).endpoints) > 0 {
nodes[rIP.String()] = rIP
}
}
log.G(context.TODO()).Debugf("List of nodes: %s", nodes)
if add {
for _, rIP := range nodes {
if err := setupEncryption(lIP, aIP, rIP, d.secMap, d.keys); err != nil {
log.G(context.TODO()).Warnf("Failed to program network encryption between %s and %s: %v", lIP, rIP, err)
}
}
} else {
if len(nodes) == 0 {
if err := removeEncryption(lIP, rIP, d.secMap); err != nil {
log.G(context.TODO()).Warnf("Failed to remove network encryption between %s and %s: %v", lIP, rIP, err)
}
}
}
return nil
}
// setupEncryption programs the encryption parameters for secure communication
// between the local node and a remote node.
func setupEncryption(localIP, advIP, remoteIP net.IP, em *encrMap, keys []*key) error {
d.mu.Lock()
localIP, advIP := d.bindAddress, d.advertiseAddress
d.mu.Unlock()
log.G(context.TODO()).Debugf("Programming encryption between %s and %s", localIP, remoteIP)
rIPs := remoteIP.String()
indices := make([]*spi, 0, len(keys))
indices := make([]spi, 0, len(d.keys))
for i, k := range keys {
spis := &spi{buildSPI(advIP, remoteIP, k.tag), buildSPI(remoteIP, advIP, k.tag)}
for i, k := range d.keys {
spis := spi{buildSPI(advIP.AsSlice(), remoteIP.AsSlice(), k.tag), buildSPI(remoteIP.AsSlice(), advIP.AsSlice(), k.tag)}
dir := reverse
if i == 0 {
dir = bidir
}
fSA, rSA, err := programSA(localIP, remoteIP, spis, k, dir, true)
fSA, rSA, err := programSA(localIP.AsSlice(), remoteIP.AsSlice(), spis, k, dir, true)
if err != nil {
log.G(context.TODO()).Warn(err)
}
@@ -191,26 +152,36 @@ func setupEncryption(localIP, advIP, remoteIP net.IP, em *encrMap, keys []*key)
}
}
em.Lock()
em.nodes[rIPs] = indices
em.Unlock()
node := d.secMap[remoteIP]
node.spi = indices
node.count++
d.secMap[remoteIP] = node
return nil
}
func removeEncryption(localIP, remoteIP net.IP, em *encrMap) error {
em.Lock()
indices, ok := em.nodes[remoteIP.String()]
em.Unlock()
if !ok {
return nil
func (d *driver) removeEncryption(remoteIP netip.Addr) error {
log.G(context.TODO()).Debugf("removeEncryption(%s)", remoteIP)
d.encrMu.Lock()
defer d.encrMu.Unlock()
var spi []spi
node := d.secMap[remoteIP]
if node.count == 1 {
delete(d.secMap, remoteIP)
spi = node.spi
} else {
node.count--
d.secMap[remoteIP] = node
}
for i, idxs := range indices {
for i, idxs := range spi {
dir := reverse
if i == 0 {
dir = bidir
}
fSA, rSA, err := programSA(localIP, remoteIP, idxs, nil, dir, false)
fSA, rSA, err := programSA(d.bindAddress.AsSlice(), remoteIP.AsSlice(), idxs, nil, dir, false)
if err != nil {
log.G(context.TODO()).Warn(err)
}
@@ -304,7 +275,7 @@ func (d *driver) programInput(vni uint32, add bool) error {
return nil
}
func programSA(localIP, remoteIP net.IP, spi *spi, k *key, dir int, add bool) (fSA *netlink.XfrmState, rSA *netlink.XfrmState, err error) {
func programSA(localIP, remoteIP net.IP, spi spi, k *key, dir int, add bool) (fSA *netlink.XfrmState, rSA *netlink.XfrmState, lastErr error) {
var (
action = "Removing"
xfrmProgram = ns.NlHandle().XfrmStateDel
@@ -330,6 +301,7 @@ func programSA(localIP, remoteIP net.IP, spi *spi, k *key, dir int, add bool) (f
exists, err := saExists(rSA)
if err != nil {
lastErr = err
exists = !add
}
@@ -356,6 +328,7 @@ func programSA(localIP, remoteIP net.IP, spi *spi, k *key, dir int, add bool) (f
exists, err := saExists(fSA)
if err != nil {
lastErr = err
exists = !add
}
@@ -367,7 +340,7 @@ func programSA(localIP, remoteIP net.IP, spi *spi, k *key, dir int, add bool) (f
}
}
return
return fSA, rSA, lastErr
}
// getMinimalIP returns the address in its shortest form
@@ -475,29 +448,15 @@ func buildAeadAlgo(k *key, s int) *netlink.XfrmStateAlgo {
}
}
func (d *driver) secMapWalk(f func(string, []*spi) ([]*spi, bool)) error {
d.secMap.Lock()
for node, indices := range d.secMap.nodes {
idxs, stop := f(node, indices)
if idxs != nil {
d.secMap.nodes[node] = idxs
}
if stop {
break
}
}
d.secMap.Unlock()
return nil
}
func (d *driver) setKeys(keys []*key) error {
d.encrMu.Lock()
defer d.encrMu.Unlock()
// Remove any stale policy, state
clearEncryptionStates()
// Accept the encryption keys and clear any stale encryption map
d.Lock()
d.secMap = encrMap{}
d.keys = keys
d.secMap = &encrMap{nodes: map[string][]*spi{}}
d.Unlock()
log.G(context.TODO()).Debugf("Initial encryption keys: %v", keys)
return nil
}
@@ -505,6 +464,9 @@ func (d *driver) setKeys(keys []*key) error {
// updateKeys allows to add a new key and/or change the primary key and/or prune an existing key
// The primary key is the key used in transmission and will go in first position in the list.
func (d *driver) updateKeys(newKey, primary, pruneKey *key) error {
d.encrMu.Lock()
defer d.encrMu.Unlock()
log.G(context.TODO()).Debugf("Updating Keys. New: %v, Primary: %v, Pruned: %v", newKey, primary, pruneKey)
log.G(context.TODO()).Debugf("Current: %v", d.keys)
@@ -517,9 +479,6 @@ func (d *driver) updateKeys(newKey, primary, pruneKey *key) error {
aIP = d.advertiseAddress
)
d.Lock()
defer d.Unlock()
// add new
if newKey != nil {
d.keys = append(d.keys, newKey)
@@ -545,10 +504,12 @@ func (d *driver) updateKeys(newKey, primary, pruneKey *key) error {
return types.InvalidParameterErrorf("attempting to both make a key (index %d) primary and delete it", priIdx)
}
d.secMapWalk(func(rIPs string, spis []*spi) ([]*spi, bool) {
rIP := net.ParseIP(rIPs)
return updateNodeKey(lIP, aIP, rIP, spis, d.keys, newIdx, priIdx, delIdx), false
})
for rIP, node := range d.secMap {
idxs := updateNodeKey(lIP.AsSlice(), aIP.AsSlice(), rIP.AsSlice(), node.spi, d.keys, newIdx, priIdx, delIdx)
if idxs != nil {
d.secMap[rIP] = encrNode{idxs, node.count}
}
}
// swap primary
if priIdx != -1 {
@@ -574,7 +535,7 @@ func (d *driver) updateKeys(newKey, primary, pruneKey *key) error {
*********************************************************/
// Spis and keys are sorted in such away the one in position 0 is the primary
func updateNodeKey(lIP, aIP, rIP net.IP, idxs []*spi, curKeys []*key, newIdx, priIdx, delIdx int) []*spi {
func updateNodeKey(lIP, aIP, rIP net.IP, idxs []spi, curKeys []*key, newIdx, priIdx, delIdx int) []spi {
log.G(context.TODO()).Debugf("Updating keys for node: %s (%d,%d,%d)", rIP, newIdx, priIdx, delIdx)
spis := idxs
@@ -582,7 +543,7 @@ func updateNodeKey(lIP, aIP, rIP net.IP, idxs []*spi, curKeys []*key, newIdx, pr
// add new
if newIdx != -1 {
spis = append(spis, &spi{
spis = append(spis, spi{
forward: buildSPI(aIP, rIP, curKeys[newIdx].tag),
reverse: buildSPI(rIP, aIP, curKeys[newIdx].tag),
})

View File

@@ -4,12 +4,14 @@ package overlay
import (
"context"
"errors"
"fmt"
"net"
"net/netip"
"syscall"
"github.com/containerd/log"
"github.com/docker/docker/libnetwork/driverapi"
"github.com/docker/docker/libnetwork/internal/netiputil"
"github.com/docker/docker/libnetwork/ns"
"github.com/docker/docker/libnetwork/osl"
"github.com/docker/docker/libnetwork/types"
@@ -22,18 +24,24 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,
return err
}
n := d.network(nid)
if n == nil {
return fmt.Errorf("could not find network with id %s", nid)
n, unlock, err := d.lockNetwork(nid)
if err != nil {
return err
}
defer unlock()
ep := n.endpoint(eid)
ep := n.endpoints[eid]
if ep == nil {
return fmt.Errorf("could not find endpoint with id %s", eid)
}
if n.secure && len(d.keys) == 0 {
return fmt.Errorf("cannot join secure network: encryption keys not present")
if n.secure {
d.encrMu.Lock()
nkeys := len(d.keys)
d.encrMu.Unlock()
if nkeys == 0 {
return errors.New("cannot join secure network: encryption keys not present")
}
}
nlh := ns.NlHandle()
@@ -51,8 +59,6 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,
return fmt.Errorf("network sandbox join failed: %v", err)
}
sbox := n.sandbox()
overlayIfName, containerIfName, err := createVethPair()
if err != nil {
return err
@@ -74,7 +80,7 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,
return err
}
if err = sbox.AddInterface(overlayIfName, "veth", osl.WithMaster(s.brName)); err != nil {
if err = n.sbox.AddInterface(overlayIfName, "veth", osl.WithMaster(s.brName)); err != nil {
return fmt.Errorf("could not add veth pair inside the network sandbox: %v", err)
}
@@ -87,7 +93,7 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,
return err
}
if err = nlh.LinkSetHardwareAddr(veth, ep.mac); err != nil {
if err = nlh.LinkSetHardwareAddr(veth, ep.mac.AsSlice()); err != nil {
return fmt.Errorf("could not set mac address (%v) to the container interface: %v", ep.mac, err)
}
@@ -95,7 +101,7 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,
if sub == s {
continue
}
if err = jinfo.AddStaticRoute(sub.subnetIP, types.NEXTHOP, s.gwIP.IP); err != nil {
if err = jinfo.AddStaticRoute(netiputil.ToIPNet(sub.subnetIP), types.NEXTHOP, s.gwIP.Addr().AsSlice()); err != nil {
log.G(context.TODO()).Errorf("Adding subnet %s static route in network %q failed\n", s.subnetIP, n.id)
}
}
@@ -107,10 +113,8 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,
}
}
d.peerAdd(nid, eid, ep.addr.IP, ep.addr.Mask, ep.mac, d.advertiseAddress, false, false, true)
if err = d.checkEncryption(nid, nil, true, true); err != nil {
log.G(context.TODO()).Warn(err)
if err := n.peerAdd(eid, ep.addr, ep.mac, netip.Addr{}); err != nil {
return fmt.Errorf("overlay: failed to add local endpoint to network peer db: %w", err)
}
buf, err := proto.Marshal(&PeerRecord{
@@ -122,7 +126,7 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,
return err
}
if err := jinfo.AddTableEntry(ovPeerTable, eid, buf); err != nil {
if err := jinfo.AddTableEntry(OverlayPeerTable, eid, buf); err != nil {
log.G(context.TODO()).Errorf("overlay: Failed adding table entry to joininfo: %v", err)
}
@@ -130,7 +134,7 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,
}
func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
if tablename != ovPeerTable {
if tablename != OverlayPeerTable {
log.G(context.TODO()).Errorf("DecodeTableEntry: unexpected table name %s", tablename)
return "", nil
}
@@ -146,50 +150,74 @@ func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (s
}
}
func (d *driver) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
if tableName != ovPeerTable {
func (d *driver) EventNotify(nid, tableName, key string, prev, value []byte) {
if tableName != OverlayPeerTable {
log.G(context.TODO()).Errorf("Unexpected table notification for table %s received", tableName)
return
}
eid := key
var peer PeerRecord
if err := proto.Unmarshal(value, &peer); err != nil {
log.G(context.TODO()).Errorf("Failed to unmarshal peer record: %v", err)
var prevPeer, newPeer *Peer
if prev != nil {
var err error
prevPeer, err = UnmarshalPeerRecord(prev)
if err != nil {
log.G(context.TODO()).WithError(err).Error("Failed to unmarshal previous peer record")
} else if prevPeer.TunnelEndpointIP == d.advertiseAddress {
// Ignore local peers. We don't add them to the VXLAN
// FDB so don't need to remove them.
prevPeer = nil
}
}
if value != nil {
var err error
newPeer, err = UnmarshalPeerRecord(value)
if err != nil {
log.G(context.TODO()).WithError(err).Error("Failed to unmarshal peer record")
} else if newPeer.TunnelEndpointIP == d.advertiseAddress {
newPeer = nil
}
}
if prevPeer == nil && newPeer == nil {
// Nothing to do! Either the event was for a local peer,
// or unmarshaling failed.
return
}
if prevPeer != nil && newPeer != nil && *prevPeer == *newPeer {
// The update did not materially change the FDB entry.
return
}
// Ignore local peers. We already know about them and they
// should not be added to vxlan fdb.
if net.ParseIP(peer.TunnelEndpointIP).Equal(d.advertiseAddress) {
return
}
addr, err := types.ParseCIDR(peer.EndpointIP)
n, unlock, err := d.lockNetwork(nid)
if err != nil {
log.G(context.TODO()).Errorf("Invalid peer IP %s received in event notify", peer.EndpointIP)
log.G(context.TODO()).WithFields(log.Fields{
"error": err,
"nid": nid,
}).Error("overlay: handling peer event")
return
}
defer unlock()
mac, err := net.ParseMAC(peer.EndpointMAC)
if err != nil {
log.G(context.TODO()).Errorf("Invalid mac %s received in event notify", peer.EndpointMAC)
return
if prevPeer != nil {
if err := n.peerDelete(eid, prevPeer.EndpointIP, prevPeer.EndpointMAC, prevPeer.TunnelEndpointIP); err != nil {
log.G(context.TODO()).WithFields(log.Fields{
"error": err,
"nid": n.id,
"peer": prevPeer,
}).Warn("overlay: failed to delete peer entry")
}
}
vtep := net.ParseIP(peer.TunnelEndpointIP)
if vtep == nil {
log.G(context.TODO()).Errorf("Invalid VTEP %s received in event notify", peer.TunnelEndpointIP)
return
if newPeer != nil {
if err := n.peerAdd(eid, newPeer.EndpointIP, newPeer.EndpointMAC, newPeer.TunnelEndpointIP); err != nil {
log.G(context.TODO()).WithFields(log.Fields{
"error": err,
"nid": n.id,
"peer": newPeer,
}).Warn("overlay: failed to add peer entry")
}
}
if etype == driverapi.Delete {
d.peerDelete(nid, eid, addr.IP, addr.Mask, mac, vtep, false)
return
}
d.peerAdd(nid, eid, addr.IP, addr.Mask, mac, vtep, false, false, false)
}
// Leave method is invoked when a Sandbox detaches from an endpoint.
@@ -198,18 +226,21 @@ func (d *driver) Leave(nid, eid string) error {
return err
}
n := d.network(nid)
if n == nil {
return fmt.Errorf("could not find network with id %s", nid)
n, unlock, err := d.lockNetwork(nid)
if err != nil {
return err
}
defer unlock()
ep := n.endpoint(eid)
ep := n.endpoints[eid]
if ep == nil {
return types.InternalMaskableErrorf("could not find endpoint with id %s", eid)
}
d.peerDelete(nid, eid, ep.addr.IP, ep.addr.Mask, ep.mac, d.advertiseAddress, true)
if err := n.peerDelete(eid, ep.addr, ep.mac, netip.Addr{}); err != nil {
return fmt.Errorf("overlay: failed to delete local endpoint eid:%s from network peer db: %w", eid, err)
}
n.leaveSandbox()

View File

@@ -5,10 +5,12 @@ package overlay
import (
"context"
"fmt"
"net"
"net/netip"
"github.com/containerd/log"
"github.com/docker/docker/libnetwork/driverapi"
"github.com/docker/docker/libnetwork/internal/hashable"
"github.com/docker/docker/libnetwork/internal/netiputil"
"github.com/docker/docker/libnetwork/netutils"
"github.com/docker/docker/libnetwork/ns"
)
@@ -19,27 +21,8 @@ type endpoint struct {
id string
nid string
ifName string
mac net.HardwareAddr
addr *net.IPNet
}
func (n *network) endpoint(eid string) *endpoint {
n.Lock()
defer n.Unlock()
return n.endpoints[eid]
}
func (n *network) addEndpoint(ep *endpoint) {
n.Lock()
n.endpoints[ep.id] = ep
n.Unlock()
}
func (n *network) deleteEndpoint(eid string) {
n.Lock()
delete(n.endpoints, eid)
n.Unlock()
mac hashable.MACAddr
addr netip.Prefix
}
func (d *driver) CreateEndpoint(nid, eid string, ifInfo driverapi.InterfaceInfo, epOptions map[string]interface{}) error {
@@ -55,18 +38,19 @@ func (d *driver) CreateEndpoint(nid, eid string, ifInfo driverapi.InterfaceInfo,
return err
}
n := d.network(nid)
if n == nil {
return fmt.Errorf("network id %q not found", nid)
n, unlock, err := d.lockNetwork(nid)
if err != nil {
return err
}
defer unlock()
ep := &endpoint{
id: eid,
nid: n.id,
addr: ifInfo.Address(),
mac: ifInfo.MacAddress(),
id: eid,
nid: n.id,
}
if ep.addr == nil {
var ok bool
ep.addr, ok = netiputil.ToPrefix(ifInfo.Address())
if !ok {
return fmt.Errorf("create endpoint was not passed interface IP address")
}
@@ -74,14 +58,24 @@ func (d *driver) CreateEndpoint(nid, eid string, ifInfo driverapi.InterfaceInfo,
return fmt.Errorf("no matching subnet for IP %q in network %q", ep.addr, nid)
}
if ep.mac == nil {
ep.mac = netutils.GenerateMACFromIP(ep.addr.IP)
if err := ifInfo.SetMacAddress(ep.mac); err != nil {
if ifmac := ifInfo.MacAddress(); ifmac != nil {
var ok bool
ep.mac, ok = hashable.MACAddrFromSlice(ifInfo.MacAddress())
if !ok {
return fmt.Errorf("invalid MAC address %q assigned to endpoint: unexpected length", ifmac)
}
} else {
var ok bool
ep.mac, ok = hashable.MACAddrFromSlice(netutils.GenerateMACFromIP(ep.addr.Addr().AsSlice()))
if !ok {
panic("GenerateMACFromIP returned a HardwareAddress that is not a MAC-48")
}
if err := ifInfo.SetMacAddress(ep.mac.AsSlice()); err != nil {
return err
}
}
n.addEndpoint(ep)
n.endpoints[ep.id] = ep
return nil
}
@@ -93,17 +87,18 @@ func (d *driver) DeleteEndpoint(nid, eid string) error {
return err
}
n := d.network(nid)
if n == nil {
return fmt.Errorf("network id %q not found", nid)
n, unlock, err := d.lockNetwork(nid)
if err != nil {
return err
}
defer unlock()
ep := n.endpoint(eid)
ep := n.endpoints[eid]
if ep == nil {
return fmt.Errorf("endpoint id %q not found", eid)
}
n.deleteEndpoint(eid)
delete(n.endpoints, eid)
if ep.ifName == "" {
return nil

View File

@@ -1,4 +1,5 @@
//go:build linux
// FIXME(thaJeztah): remove once we are a module; the go:build directive prevents go from downgrading language version to go1.16:
//go:build go1.23 && linux
package overlay
@@ -6,7 +7,7 @@ import (
"context"
"errors"
"fmt"
"net"
"net/netip"
"os"
"path/filepath"
"runtime"
@@ -17,6 +18,9 @@ import (
"github.com/containerd/log"
"github.com/docker/docker/libnetwork/driverapi"
"github.com/docker/docker/libnetwork/drivers/overlay/overlayutils"
"github.com/docker/docker/libnetwork/internal/countmap"
"github.com/docker/docker/libnetwork/internal/hashable"
"github.com/docker/docker/libnetwork/internal/netiputil"
"github.com/docker/docker/libnetwork/netlabel"
"github.com/docker/docker/libnetwork/ns"
"github.com/docker/docker/libnetwork/osl"
@@ -41,23 +45,32 @@ type subnet struct {
brName string
vni uint32
initErr error
subnetIP *net.IPNet
gwIP *net.IPNet
subnetIP netip.Prefix
gwIP netip.Prefix
}
type network struct {
id string
id string
driver *driver
secure bool
mtu int
// mu must be held when accessing any of the variable struct fields below,
// calling any method on the network not noted as safe for concurrent use,
// or manipulating the driver.networks key for this network id.
// This mutex is at the top of the lock hierarchy: any other locks in
// package structs can be locked while holding this lock.
mu sync.Mutex
sbox *osl.Namespace
endpoints endpointTable
driver *driver
joinCnt int
// Ref count of VXLAN Forwarding Database entries programmed into the kernel
fdbCnt countmap.Map[hashable.IPMAC]
sboxInit bool
initEpoch int
initErr error
subnets []*subnet
secure bool
mtu int
sync.Mutex
peerdb peerMap
}
func init() {
@@ -97,6 +110,7 @@ func (d *driver) CreateNetwork(id string, option map[string]interface{}, nInfo d
driver: d,
endpoints: endpointTable{},
subnets: []*subnet{},
fdbCnt: countmap.Map[hashable.IPMAC]{},
}
vnis := make([]uint32, 0, len(ipV4Data))
@@ -137,19 +151,41 @@ func (d *driver) CreateNetwork(id string, option map[string]interface{}, nInfo d
}
for i, ipd := range ipV4Data {
s := &subnet{
subnetIP: ipd.Pool,
gwIP: ipd.Gateway,
vni: vnis[i],
}
s := &subnet{vni: vnis[i]}
s.subnetIP, _ = netiputil.ToPrefix(ipd.Pool)
s.gwIP, _ = netiputil.ToPrefix(ipd.Gateway)
n.subnets = append(n.subnets, s)
}
d.Lock()
defer d.Unlock()
if d.networks[n.id] != nil {
return fmt.Errorf("attempt to create overlay network %v that already exists", n.id)
// Lock the network before adding it to the networks table so we can
// release the big driver lock before we finish initializing the network
// while continuing to exclude other operations on the network from
// proceeding until we are done.
n.mu.Lock()
defer n.mu.Unlock()
d.mu.Lock()
oldnet := d.networks[id]
if oldnet == nil {
d.networks[id] = n
d.mu.Unlock()
} else {
// The network already exists, but we might be racing DeleteNetwork.
// Synchronize and check again.
d.mu.Unlock()
oldnet.mu.Lock()
d.mu.Lock()
_, ok := d.networks[id]
if !ok {
// It's gone! Stake our claim to the network id.
d.networks[id] = n
}
d.mu.Unlock()
oldnet.mu.Unlock()
if ok {
return fmt.Errorf("attempt to create overlay network %v that already exists", n.id)
}
}
// Make sure no rule is on the way from any stale secure network
@@ -161,14 +197,11 @@ func (d *driver) CreateNetwork(id string, option map[string]interface{}, nInfo d
}
if nInfo != nil {
if err := nInfo.TableEventRegister(ovPeerTable, driverapi.EndpointObject); err != nil {
// XXX Undo writeToStore? No method to so. Why?
if err := nInfo.TableEventRegister(OverlayPeerTable, driverapi.EndpointObject); err != nil {
return err
}
}
d.networks[id] = n
return nil
}
@@ -182,23 +215,14 @@ func (d *driver) DeleteNetwork(nid string) error {
return err
}
d.Lock()
// Only perform a peer flush operation (if required) AFTER unlocking
// the driver lock to avoid deadlocking w/ the peerDB.
var doPeerFlush bool
defer func() {
d.Unlock()
if doPeerFlush {
d.peerFlush(nid)
}
}()
// This is similar to d.network(), but we need to keep holding the lock
// until we are done removing this network.
n := d.networks[nid]
if n == nil {
return fmt.Errorf("could not find network with id %s", nid)
n, unlock, err := d.lockNetwork(nid)
if err != nil {
return err
}
// Unlock the network even if it's going to become garbage as another
// goroutine could be blocked waiting for the lock, such as in
// (*driver).lockNetwork.
defer unlock()
for _, ep := range n.endpoints {
if ep.ifName != "" {
@@ -210,9 +234,6 @@ func (d *driver) DeleteNetwork(nid string) error {
}
}
doPeerFlush = true
delete(d.networks, nid)
if n.secure {
for _, s := range n.subnets {
if err := d.programMangle(s.vni, false); err != nil {
@@ -232,6 +253,10 @@ func (d *driver) DeleteNetwork(nid string) error {
}
}
d.mu.Lock()
delete(d.networks, nid)
d.mu.Unlock()
return nil
}
@@ -248,22 +273,11 @@ func (n *network) joinSandbox(s *subnet, incJoinCount bool) error {
// the other will wait.
networkOnce.Do(populateVNITbl)
n.Lock()
// If initialization was successful then tell the peerDB to initialize the
// sandbox with all the peers previously received from networkdb. But only
// do this after unlocking the network. Otherwise we could deadlock with
// on the peerDB channel while peerDB is waiting for the network lock.
var doInitPeerDB bool
defer func() {
n.Unlock()
if doInitPeerDB {
go n.driver.initSandboxPeerDB(n.id)
}
}()
var initialized bool
if !n.sboxInit {
n.initErr = n.initSandbox()
doInitPeerDB = n.initErr == nil
initialized = n.initErr == nil
// If there was an error, we cannot recover it
n.sboxInit = true
}
@@ -289,12 +303,19 @@ func (n *network) joinSandbox(s *subnet, incJoinCount bool) error {
n.joinCnt++
}
if initialized {
if err := n.initSandboxPeerDB(); err != nil {
log.G(context.TODO()).WithFields(log.Fields{
"nid": n.id,
"error": err,
}).Warn("failed to initialize network peer database")
}
}
return nil
}
func (n *network) leaveSandbox() {
n.Lock()
defer n.Unlock()
n.joinCnt--
if n.joinCnt != 0 {
return
@@ -426,7 +447,7 @@ func (n *network) setupSubnetSandbox(s *subnet, brName, vxlanName string) error
// create a bridge and vxlan device for this subnet and move it to the sandbox
sbox := n.sbox
if err := sbox.AddInterface(brName, "br", osl.WithIPv4Address(s.gwIP), osl.WithIsBridge(true)); err != nil {
if err := sbox.AddInterface(brName, "br", osl.WithIPv4Address(netiputil.ToIPNet(s.gwIP)), osl.WithIsBridge(true)); err != nil {
return fmt.Errorf("bridge creation in sandbox failed for subnet %q: %v", s.subnetIP.String(), err)
}
@@ -594,34 +615,50 @@ func (n *network) initSandbox() error {
// this is needed to let the peerAdd configure the sandbox
n.sbox = sbox
n.fdbCnt = countmap.Map[hashable.IPMAC]{}
return nil
}
func (d *driver) network(nid string) *network {
d.Lock()
n := d.networks[nid]
d.Unlock()
return n
}
func (n *network) sandbox() *osl.Namespace {
n.Lock()
defer n.Unlock()
return n.sbox
// lockNetwork returns the network object for nid, locked for exclusive access.
//
// It is the caller's responsibility to release the network lock by calling the
// returned unlock function.
func (d *driver) lockNetwork(nid string) (n *network, unlock func(), err error) {
d.mu.Lock()
n = d.networks[nid]
d.mu.Unlock()
for {
if n == nil {
return nil, nil, fmt.Errorf("network %q not found", nid)
}
// We can't lock the network object while holding the driver
// lock or we risk a lock order reversal deadlock.
n.mu.Lock()
// d.networks[nid] might have been replaced or removed after we
// unlocked the driver lock. Double-check that the network we
// just locked is the active network object for the nid.
d.mu.Lock()
n2 := d.networks[nid]
d.mu.Unlock()
if n2 == n {
return n, n.mu.Unlock, nil
}
// We locked a garbage object. Spin until the network we locked
// matches up with the one present in the table.
n.mu.Unlock()
n = n2
}
}
// getSubnetforIP returns the subnet to which the given IP belongs
func (n *network) getSubnetforIP(ip *net.IPNet) *subnet {
func (n *network) getSubnetforIP(ip netip.Prefix) *subnet {
for _, s := range n.subnets {
// first check if the mask lengths are the same
i, _ := s.subnetIP.Mask.Size()
j, _ := ip.Mask.Size()
if i != j {
if s.subnetIP.Bits() != ip.Bits() {
continue
}
if s.subnetIP.Contains(ip.IP) {
if s.subnetIP.Contains(ip.Addr()) {
return s
}
}

View File

@@ -7,7 +7,7 @@ package overlay
import (
"context"
"fmt"
"net"
"net/netip"
"sync"
"github.com/containerd/log"
@@ -24,32 +24,44 @@ const (
secureOption = "encrypted"
)
// overlay driver must implement the discover-API.
var _ discoverapi.Discover = (*driver)(nil)
var (
_ discoverapi.Discover = (*driver)(nil)
_ driverapi.TableWatcher = (*driver)(nil)
)
type driver struct {
bindAddress, advertiseAddress net.IP
// Immutable; mu does not need to be held when accessing these fields.
config map[string]interface{}
peerDb peerNetworkMap
secMap *encrMap
networks networkTable
initOS sync.Once
localJoinOnce sync.Once
keys []*key
peerOpMu sync.Mutex
sync.Mutex
config map[string]interface{}
initOS sync.Once
// encrMu guards secMap and keys,
// and synchronizes the application of encryption parameters
// to the kernel.
//
// This mutex is above mu in the lock hierarchy.
// Do not lock any locks aside from mu while holding encrMu.
encrMu sync.Mutex
secMap encrMap
keys []*key
// mu must be held when accessing the fields which follow it
// in the struct definition.
//
// This mutex is at the bottom of the lock hierarchy:
// do not lock any other locks while holding it.
mu sync.Mutex
bindAddress netip.Addr
advertiseAddress netip.Addr
networks networkTable
}
// Register registers a new instance of the overlay driver.
func Register(r driverapi.Registerer, config map[string]interface{}) error {
d := &driver{
networks: networkTable{},
peerDb: peerNetworkMap{
mp: map[string]*peerMap{},
},
secMap: &encrMap{nodes: map[string][]*spi{}},
config: config,
secMap: encrMap{},
config: config,
}
return r.RegisterDriver(NetworkType, d, driverapi.Capability{
DataScope: scope.Global,
@@ -78,28 +90,23 @@ func (d *driver) isIPv6Transport() (bool, error) {
// from the address family of our own advertise address. This is a
// reasonable inference to make as Linux VXLAN links do not support
// mixed-address-family remote peers.
if d.advertiseAddress == nil {
if !d.advertiseAddress.IsValid() {
return false, fmt.Errorf("overlay: cannot determine address family of transport: the local data-plane address is not currently known")
}
return d.advertiseAddress.To4() == nil, nil
return d.advertiseAddress.Is6(), nil
}
func (d *driver) nodeJoin(data discoverapi.NodeDiscoveryData) error {
if data.Self {
advAddr, bindAddr := net.ParseIP(data.Address), net.ParseIP(data.BindAddress)
if advAddr == nil {
advAddr, _ := netip.ParseAddr(data.Address)
bindAddr, _ := netip.ParseAddr(data.BindAddress)
if !advAddr.IsValid() {
return fmt.Errorf("invalid discovery data")
}
d.Lock()
d.mu.Lock()
d.advertiseAddress = advAddr
d.bindAddress = bindAddr
d.Unlock()
// If containers are already running on this network update the
// advertise address in the peerDB
d.localJoinOnce.Do(func() {
d.peerDBUpdateSelf()
})
d.mu.Unlock()
}
return nil
}

View File

@@ -20,7 +20,7 @@ func (dt *driverTester) GetPluginGetter() plugingetter.PluginGetter {
return nil
}
func (dt *driverTester) RegisterDriver(name string, drv driverapi.Driver, cap driverapi.Capability) error {
func (dt *driverTester) RegisterDriver(name string, drv driverapi.Driver, capability driverapi.Capability) error {
if name != testNetworkType {
dt.t.Fatalf("Expected driver register name to be %q. Instead got %q",
testNetworkType, name)

View File

@@ -166,13 +166,6 @@ func (d *driver) CreateNetwork(id string, option map[string]interface{}, nInfo d
return types.NotImplementedErrorf("not implemented")
}
func (d *driver) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
}
func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
return "", nil
}
func (d *driver) DeleteNetwork(nid string) error {
return types.NotImplementedErrorf("not implemented")
}

View File

@@ -0,0 +1,42 @@
package overlay
import (
"fmt"
"net/netip"
"github.com/docker/docker/libnetwork/internal/hashable"
"github.com/gogo/protobuf/proto"
)
// OverlayPeerTable is the NetworkDB table for overlay network peer discovery.
const OverlayPeerTable = "overlay_peer_table"
type Peer struct {
EndpointIP netip.Prefix
EndpointMAC hashable.MACAddr
TunnelEndpointIP netip.Addr
}
func UnmarshalPeerRecord(data []byte) (*Peer, error) {
var pr PeerRecord
if err := proto.Unmarshal(data, &pr); err != nil {
return nil, fmt.Errorf("failed to unmarshal peer record: %w", err)
}
var (
p Peer
err error
)
p.EndpointIP, err = netip.ParsePrefix(pr.EndpointIP)
if err != nil {
return nil, fmt.Errorf("invalid peer IP %q received: %w", pr.EndpointIP, err)
}
p.EndpointMAC, err = hashable.ParseMAC(pr.EndpointMAC)
if err != nil {
return nil, fmt.Errorf("invalid MAC %q received: %w", pr.EndpointMAC, err)
}
p.TunnelEndpointIP, err = netip.ParseAddr(pr.TunnelEndpointIP)
if err != nil {
return nil, fmt.Errorf("invalid VTEP %q received: %w", pr.TunnelEndpointIP, err)
}
return &p, nil
}

View File

@@ -5,227 +5,75 @@ package overlay
import (
"context"
"errors"
"fmt"
"net"
"sync"
"net/netip"
"syscall"
"github.com/containerd/log"
"github.com/docker/docker/libnetwork/internal/hashable"
"github.com/docker/docker/libnetwork/internal/setmatrix"
"github.com/docker/docker/libnetwork/osl"
)
const ovPeerTable = "overlay_peer_table"
type peerKey struct {
peerIP net.IP
peerMac net.HardwareAddr
}
type peerEntry struct {
eid string
vtep net.IP
peerIPMask net.IPMask
isLocal bool
eid string
mac hashable.MACAddr
vtep netip.Addr
}
func (p *peerEntry) MarshalDB() peerEntryDB {
ones, bits := p.peerIPMask.Size()
return peerEntryDB{
eid: p.eid,
vtep: p.vtep.String(),
peerIPMaskOnes: ones,
peerIPMaskBits: bits,
isLocal: p.isLocal,
}
}
// This the structure saved into the set (SetMatrix), due to the implementation of it
// the value inserted in the set has to be Hashable so the []byte had to be converted into
// strings
type peerEntryDB struct {
eid string
vtep string
peerIPMaskOnes int
peerIPMaskBits int
isLocal bool
}
func (p *peerEntryDB) UnMarshalDB() peerEntry {
return peerEntry{
eid: p.eid,
vtep: net.ParseIP(p.vtep),
peerIPMask: net.CIDRMask(p.peerIPMaskOnes, p.peerIPMaskBits),
isLocal: p.isLocal,
}
func (p *peerEntry) isLocal() bool {
return !p.vtep.IsValid()
}
type peerMap struct {
// set of peerEntry, note the values have to be objects and not pointers to maintain the proper equality checks
mp setmatrix.SetMatrix[peerEntryDB]
sync.Mutex
mp setmatrix.SetMatrix[netip.Prefix, peerEntry]
}
type peerNetworkMap struct {
// map with key peerKey
mp map[string]*peerMap
sync.Mutex
}
func (pKey peerKey) String() string {
return fmt.Sprintf("%s %s", pKey.peerIP, pKey.peerMac)
}
func (pKey *peerKey) Scan(state fmt.ScanState, verb rune) error {
ipB, err := state.Token(true, nil)
if err != nil {
return err
}
pKey.peerIP = net.ParseIP(string(ipB))
macB, err := state.Token(true, nil)
if err != nil {
return err
}
pKey.peerMac, err = net.ParseMAC(string(macB))
return err
}
func (d *driver) peerDbWalk(f func(string, *peerKey, *peerEntry) bool) error {
d.peerDb.Lock()
nids := []string{}
for nid := range d.peerDb.mp {
nids = append(nids, nid)
}
d.peerDb.Unlock()
for _, nid := range nids {
d.peerDbNetworkWalk(nid, func(pKey *peerKey, pEntry *peerEntry) bool {
return f(nid, pKey, pEntry)
})
}
return nil
}
func (d *driver) peerDbNetworkWalk(nid string, f func(*peerKey, *peerEntry) bool) error {
d.peerDb.Lock()
pMap, ok := d.peerDb.mp[nid]
d.peerDb.Unlock()
if !ok {
return nil
}
mp := map[string]peerEntry{}
pMap.Lock()
for _, pKeyStr := range pMap.mp.Keys() {
entryDBList, ok := pMap.mp.Get(pKeyStr)
func (pm *peerMap) Walk(f func(netip.Prefix, peerEntry)) {
for _, peerAddr := range pm.mp.Keys() {
entry, ok := pm.Get(peerAddr)
if ok {
peerEntryDB := entryDBList[0]
mp[pKeyStr] = peerEntryDB.UnMarshalDB()
f(peerAddr, entry)
}
}
pMap.Unlock()
for pKeyStr, pEntry := range mp {
var pKey peerKey
pEntry := pEntry
if _, err := fmt.Sscan(pKeyStr, &pKey); err != nil {
log.G(context.TODO()).Warnf("Peer key scan on network %s failed: %v", nid, err)
}
if f(&pKey, &pEntry) {
return nil
}
}
return nil
}
func (d *driver) peerDbSearch(nid string, peerIP net.IP) (*peerKey, *peerEntry, error) {
var pKeyMatched *peerKey
var pEntryMatched *peerEntry
err := d.peerDbNetworkWalk(nid, func(pKey *peerKey, pEntry *peerEntry) bool {
if pKey.peerIP.Equal(peerIP) {
pKeyMatched = pKey
pEntryMatched = pEntry
return true
}
return false
})
if err != nil {
return nil, nil, fmt.Errorf("peerdb search for peer ip %q failed: %v", peerIP, err)
func (pm *peerMap) Get(peerIP netip.Prefix) (peerEntry, bool) {
c, _ := pm.mp.Get(peerIP)
if len(c) == 0 {
return peerEntry{}, false
}
if pKeyMatched == nil || pEntryMatched == nil {
return nil, nil, fmt.Errorf("peer ip %q not found in peerdb", peerIP)
}
return pKeyMatched, pEntryMatched, nil
return c[0], true
}
func (d *driver) peerDbAdd(nid, eid string, peerIP net.IP, peerIPMask net.IPMask, peerMac net.HardwareAddr, vtep net.IP, isLocal bool) (bool, int) {
d.peerDb.Lock()
pMap, ok := d.peerDb.mp[nid]
if !ok {
pMap = &peerMap{}
d.peerDb.mp[nid] = pMap
}
d.peerDb.Unlock()
pKey := peerKey{
peerIP: peerIP,
peerMac: peerMac,
}
func (pm *peerMap) Add(eid string, peerIP netip.Prefix, peerMac hashable.MACAddr, vtep netip.Addr) (bool, int) {
pEntry := peerEntry{
eid: eid,
vtep: vtep,
peerIPMask: peerIPMask,
isLocal: isLocal,
eid: eid,
mac: peerMac,
vtep: vtep,
}
pMap.Lock()
defer pMap.Unlock()
b, i := pMap.mp.Insert(pKey.String(), pEntry.MarshalDB())
b, i := pm.mp.Insert(peerIP, pEntry)
if i != 1 {
// Transient case, there is more than one endpoint that is using the same IP,MAC pair
s, _ := pMap.mp.String(pKey.String())
log.G(context.TODO()).Warnf("peerDbAdd transient condition - Key:%s cardinality:%d db state:%s", pKey.String(), i, s)
// Transient case, there is more than one endpoint that is using the same IP
s, _ := pm.mp.String(peerIP)
log.G(context.TODO()).Warnf("peerDbAdd transient condition - Key:%s cardinality:%d db state:%s", peerIP, i, s)
}
return b, i
}
func (d *driver) peerDbDelete(nid, eid string, peerIP net.IP, peerIPMask net.IPMask, peerMac net.HardwareAddr, vtep net.IP, isLocal bool) (bool, int) {
d.peerDb.Lock()
pMap, ok := d.peerDb.mp[nid]
if !ok {
d.peerDb.Unlock()
return false, 0
}
d.peerDb.Unlock()
pKey := peerKey{
peerIP: peerIP,
peerMac: peerMac,
}
func (pm *peerMap) Delete(eid string, peerIP netip.Prefix, peerMac hashable.MACAddr, vtep netip.Addr) (bool, int) {
pEntry := peerEntry{
eid: eid,
vtep: vtep,
peerIPMask: peerIPMask,
isLocal: isLocal,
eid: eid,
mac: peerMac,
vtep: vtep,
}
pMap.Lock()
defer pMap.Unlock()
b, i := pMap.mp.Remove(pKey.String(), pEntry.MarshalDB())
b, i := pm.mp.Remove(peerIP, pEntry)
if i != 0 {
// Transient case, there is more than one endpoint that is using the same IP,MAC pair
s, _ := pMap.mp.String(pKey.String())
log.G(context.TODO()).Warnf("peerDbDelete transient condition - Key:%s cardinality:%d db state:%s", pKey.String(), i, s)
// Transient case, there is more than one endpoint that is using the same IP
s, _ := pm.mp.String(peerIP)
log.G(context.TODO()).Warnf("peerDbDelete transient condition - Key:%s cardinality:%d db state:%s", peerIP, i, s)
}
return b, i
}
@@ -237,201 +85,162 @@ func (d *driver) peerDbDelete(nid, eid string, peerIP net.IP, peerIPMask net.IPM
// networkDB has already delivered some events of peers already available on remote nodes,
// these peers are saved into the peerDB and this function is used to properly configure
// the network sandbox with all those peers that got previously notified.
// Note also that this method sends a single message on the channel and the go routine on the
// other side, will atomically loop on the whole table of peers and will program their state
// in one single atomic operation. This is fundamental to guarantee consistency, and avoid that
// new peerAdd or peerDelete gets reordered during the sandbox init.
func (d *driver) initSandboxPeerDB(nid string) {
d.peerOpMu.Lock()
defer d.peerOpMu.Unlock()
if err := d.peerInitOp(nid); err != nil {
log.G(context.TODO()).WithError(err).Warn("Peer init operation failed")
}
}
func (d *driver) peerInitOp(nid string) error {
return d.peerDbNetworkWalk(nid, func(pKey *peerKey, pEntry *peerEntry) bool {
// Local entries do not need to be added
if pEntry.isLocal {
return false
//
// The caller is responsible for ensuring that peerAdd and peerDelete are not
// called concurrently with this function to guarantee consistency.
func (n *network) initSandboxPeerDB() error {
var errs []error
n.peerdb.Walk(func(peerIP netip.Prefix, pEntry peerEntry) {
if !pEntry.isLocal() {
if err := n.addNeighbor(peerIP, pEntry.mac, pEntry.vtep); err != nil {
errs = append(errs, fmt.Errorf("failed to add neighbor entries for %s: %w", peerIP, err))
}
}
d.peerAddOp(nid, pEntry.eid, pKey.peerIP, pEntry.peerIPMask, pKey.peerMac, pEntry.vtep, false, false, false, pEntry.isLocal)
// return false to loop on all entries
return false
})
return errors.Join(errs...)
}
func (d *driver) peerAdd(nid, eid string, peerIP net.IP, peerIPMask net.IPMask, peerMac net.HardwareAddr, vtep net.IP, l2Miss, l3Miss, localPeer bool) {
d.peerOpMu.Lock()
defer d.peerOpMu.Unlock()
err := d.peerAddOp(nid, eid, peerIP, peerIPMask, peerMac, vtep, l2Miss, l3Miss, true, localPeer)
if err != nil {
log.G(context.TODO()).WithError(err).Warn("Peer add operation failed")
}
}
func (d *driver) peerAddOp(nid, eid string, peerIP net.IP, peerIPMask net.IPMask, peerMac net.HardwareAddr, vtep net.IP, l2Miss, l3Miss, updateDB, localPeer bool) error {
if err := validateID(nid, eid); err != nil {
return err
// peerAdd adds a new entry to the peer database.
//
// Local peers are signified by an invalid vtep (i.e. netip.Addr{}).
func (n *network) peerAdd(eid string, peerIP netip.Prefix, peerMac hashable.MACAddr, vtep netip.Addr) error {
if eid == "" {
return errors.New("invalid endpoint id")
}
var dbEntries int
var inserted bool
if updateDB {
inserted, dbEntries = d.peerDbAdd(nid, eid, peerIP, peerIPMask, peerMac, vtep, localPeer)
if !inserted {
log.G(context.TODO()).Warnf("Entry already present in db: nid:%s eid:%s peerIP:%v peerMac:%v isLocal:%t vtep:%v",
nid, eid, peerIP, peerMac, localPeer, vtep)
inserted, dbEntries := n.peerdb.Add(eid, peerIP, peerMac, vtep)
if !inserted {
log.G(context.TODO()).Warnf("Entry already present in db: nid:%s eid:%s peerIP:%v peerMac:%v vtep:%v",
n.id, eid, peerIP, peerMac, vtep)
}
if vtep.IsValid() {
err := n.addNeighbor(peerIP, peerMac, vtep)
if err != nil {
if dbEntries > 1 && errors.As(err, &osl.NeighborSearchError{}) {
// Conflicting neighbor entries are already programmed into the kernel and we are in the transient case.
// Upon deletion if the active configuration is deleted the next one from the database will be restored.
return nil
}
return fmt.Errorf("peer add operation failed: %w", err)
}
}
return nil
}
// Local peers do not need any further configuration
if localPeer {
return nil
}
n := d.network(nid)
if n == nil {
return nil
}
sbox := n.sandbox()
if sbox == nil {
// addNeighbor programs the kernel so the given peer is reachable through the VXLAN tunnel.
func (n *network) addNeighbor(peerIP netip.Prefix, peerMac hashable.MACAddr, vtep netip.Addr) error {
if n.sbox == nil {
// We are hitting this case for all the events that are arriving before that the sandbox
// is being created. The peer got already added into the database and the sanbox init will
// call the peerDbUpdateSandbox that will configure all these peers from the database
return nil
}
IP := &net.IPNet{
IP: peerIP,
Mask: peerIPMask,
}
s := n.getSubnetforIP(IP)
s := n.getSubnetforIP(peerIP)
if s == nil {
return fmt.Errorf("couldn't find the subnet %q in network %q", IP.String(), n.id)
return fmt.Errorf("couldn't find the subnet %q in network %q", peerIP.String(), n.id)
}
if err := n.joinSandbox(s, false); err != nil {
return fmt.Errorf("subnet sandbox join failed for %q: %v", s.subnetIP.String(), err)
}
if err := d.checkEncryption(nid, vtep, false, true); err != nil {
log.G(context.TODO()).Warn(err)
if n.secure {
if err := n.driver.setupEncryption(vtep); err != nil {
log.G(context.TODO()).Warn(err)
}
}
// Add neighbor entry for the peer IP
if err := sbox.AddNeighbor(peerIP, peerMac, l3Miss, osl.WithLinkName(s.vxlanName)); err != nil {
if _, ok := err.(osl.NeighborSearchError); ok && dbEntries > 1 {
// We are in the transient case so only the first configuration is programmed into the kernel
// Upon deletion if the active configuration is deleted the next one from the database will be restored
// Note we are skipping also the next configuration
return nil
}
return fmt.Errorf("could not add neighbor entry for nid:%s eid:%s into the sandbox:%v", nid, eid, err)
if err := n.sbox.AddNeighbor(peerIP.Addr().AsSlice(), peerMac.AsSlice(), osl.WithLinkName(s.vxlanName)); err != nil {
return fmt.Errorf("could not add neighbor entry into the sandbox: %w", err)
}
// Add fdb entry to the bridge for the peer mac
if err := sbox.AddNeighbor(vtep, peerMac, l2Miss, osl.WithLinkName(s.vxlanName), osl.WithFamily(syscall.AF_BRIDGE)); err != nil {
return fmt.Errorf("could not add fdb entry for nid:%s eid:%s into the sandbox:%v", nid, eid, err)
if n.fdbCnt.Add(hashable.IPMACFrom(vtep, peerMac), 1) == 1 {
if err := n.sbox.AddNeighbor(vtep.AsSlice(), peerMac.AsSlice(), osl.WithLinkName(s.vxlanName), osl.WithFamily(syscall.AF_BRIDGE)); err != nil {
return fmt.Errorf("could not add fdb entry into the sandbox: %w", err)
}
}
return nil
}
func (d *driver) peerDelete(nid, eid string, peerIP net.IP, peerIPMask net.IPMask, peerMac net.HardwareAddr, vtep net.IP, localPeer bool) {
d.peerOpMu.Lock()
defer d.peerOpMu.Unlock()
err := d.peerDeleteOp(nid, eid, peerIP, peerIPMask, peerMac, vtep, localPeer)
if err != nil {
log.G(context.TODO()).WithError(err).Warn("Peer delete operation failed")
}
}
func (d *driver) peerDeleteOp(nid, eid string, peerIP net.IP, peerIPMask net.IPMask, peerMac net.HardwareAddr, vtep net.IP, localPeer bool) error {
if err := validateID(nid, eid); err != nil {
return err
// peerDelete removes an entry from the peer database.
//
// Local peers are signified by an invalid vtep (i.e. netip.Addr{}).
func (n *network) peerDelete(eid string, peerIP netip.Prefix, peerMac hashable.MACAddr, vtep netip.Addr) error {
if eid == "" {
return errors.New("invalid endpoint id")
}
deleted, dbEntries := d.peerDbDelete(nid, eid, peerIP, peerIPMask, peerMac, vtep, localPeer)
logger := log.G(context.TODO()).WithFields(log.Fields{
"nid": n.id,
"eid": eid,
"ip": peerIP,
"mac": peerMac,
"vtep": vtep,
})
deleted, dbEntries := n.peerdb.Delete(eid, peerIP, peerMac, vtep)
if !deleted {
log.G(context.TODO()).Warnf("Entry was not in db: nid:%s eid:%s peerIP:%v peerMac:%v isLocal:%t vtep:%v",
nid, eid, peerIP, peerMac, localPeer, vtep)
logger.Warn("Peer entry was not in db")
}
n := d.network(nid)
if n == nil {
return nil
}
sbox := n.sandbox()
if sbox == nil {
return nil
}
if err := d.checkEncryption(nid, vtep, localPeer, false); err != nil {
log.G(context.TODO()).Warn(err)
}
// Local peers do not have any local configuration to delete
if !localPeer {
// Remove fdb entry to the bridge for the peer mac
if err := sbox.DeleteNeighbor(vtep, peerMac); err != nil {
if _, ok := err.(osl.NeighborSearchError); ok && dbEntries > 0 {
if vtep.IsValid() {
err := n.deleteNeighbor(peerIP, peerMac, vtep)
if err != nil {
if dbEntries > 0 && errors.As(err, &osl.NeighborSearchError{}) {
// We fall in here if there is a transient state and if the neighbor that is being deleted
// was never been configured into the kernel (we allow only 1 configuration at the time per <ip,mac> mapping)
return nil
}
return fmt.Errorf("could not delete fdb entry for nid:%s eid:%s into the sandbox:%v", nid, eid, err)
}
// Delete neighbor entry for the peer IP
if err := sbox.DeleteNeighbor(peerIP, peerMac); err != nil {
return fmt.Errorf("could not delete neighbor entry for nid:%s eid:%s into the sandbox:%v", nid, eid, err)
logger.WithError(err).Warn("Peer delete operation failed")
}
}
if dbEntries == 0 {
return nil
if dbEntries > 0 {
// If there is still an entry into the database and the deletion went through without errors means that there is now no
// configuration active in the kernel.
// Restore one configuration for the ip directly from the database, note that is guaranteed that there is one
peerEntry, ok := n.peerdb.Get(peerIP)
if !ok {
return fmt.Errorf("peerDelete: unable to restore a configuration: no entry for %v found in the database", peerIP)
}
err := n.addNeighbor(peerIP, peerEntry.mac, peerEntry.vtep)
if err != nil {
return fmt.Errorf("peer delete operation failed: %w", err)
}
}
// If there is still an entry into the database and the deletion went through without errors means that there is now no
// configuration active in the kernel.
// Restore one configuration for the <ip,mac> directly from the database, note that is guaranteed that there is one
peerKey, peerEntry, err := d.peerDbSearch(nid, peerIP)
if err != nil {
log.G(context.TODO()).Errorf("peerDeleteOp unable to restore a configuration for nid:%s ip:%v mac:%v err:%s", nid, peerIP, peerMac, err)
return err
}
return d.peerAddOp(nid, peerEntry.eid, peerIP, peerEntry.peerIPMask, peerKey.peerMac, peerEntry.vtep, false, false, false, peerEntry.isLocal)
}
func (d *driver) peerFlush(nid string) {
d.peerOpMu.Lock()
defer d.peerOpMu.Unlock()
if err := d.peerFlushOp(nid); err != nil {
log.G(context.TODO()).WithError(err).Warn("Peer flush operation failed")
}
}
func (d *driver) peerFlushOp(nid string) error {
d.peerDb.Lock()
defer d.peerDb.Unlock()
_, ok := d.peerDb.mp[nid]
if !ok {
return fmt.Errorf("Unable to find the peerDB for nid:%s", nid)
}
delete(d.peerDb.mp, nid)
return nil
}
func (d *driver) peerDBUpdateSelf() {
d.peerDbWalk(func(nid string, pkey *peerKey, pEntry *peerEntry) bool {
if pEntry.isLocal {
pEntry.vtep = d.advertiseAddress
// deleteNeighbor removes programming from the kernel for the given peer to be
// reachable through the VXLAN tunnel. It is the inverse of [driver.addNeighbor].
func (n *network) deleteNeighbor(peerIP netip.Prefix, peerMac hashable.MACAddr, vtep netip.Addr) error {
if n.sbox == nil {
return nil
}
if n.secure {
if err := n.driver.removeEncryption(vtep); err != nil {
log.G(context.TODO()).Warn(err)
}
return false
})
}
s := n.getSubnetforIP(peerIP)
if s == nil {
return fmt.Errorf("could not find the subnet %q in network %q", peerIP.String(), n.id)
}
// Remove fdb entry to the bridge for the peer mac
if n.fdbCnt.Add(hashable.IPMACFrom(vtep, peerMac), -1) == 0 {
if err := n.sbox.DeleteNeighbor(vtep.AsSlice(), peerMac.AsSlice(), osl.WithLinkName(s.vxlanName), osl.WithFamily(syscall.AF_BRIDGE)); err != nil {
return fmt.Errorf("could not delete fdb entry in the sandbox: %w", err)
}
}
// Delete neighbor entry for the peer IP
if err := n.sbox.DeleteNeighbor(peerIP.Addr().AsSlice(), peerMac.AsSlice(), osl.WithLinkName(s.vxlanName)); err != nil {
return fmt.Errorf("could not delete neighbor entry in the sandbox:%v", err)
}
return nil
}

View File

@@ -1,32 +0,0 @@
//go:build linux
package overlay
import (
"net"
"testing"
)
func TestPeerMarshal(t *testing.T) {
_, ipNet, _ := net.ParseCIDR("192.168.0.1/24")
p := &peerEntry{
eid: "eid",
isLocal: true,
peerIPMask: ipNet.Mask,
vtep: ipNet.IP,
}
entryDB := p.MarshalDB()
x := entryDB.UnMarshalDB()
if x.eid != p.eid {
t.Fatalf("Incorrect Unmarshalling for eid: %v != %v", x.eid, p.eid)
}
if x.isLocal != p.isLocal {
t.Fatalf("Incorrect Unmarshalling for isLocal: %v != %v", x.isLocal, p.isLocal)
}
if x.peerIPMask.String() != p.peerIPMask.String() {
t.Fatalf("Incorrect Unmarshalling for eid: %v != %v", x.peerIPMask, p.peerIPMask)
}
if x.vtep.String() != p.vtep.String() {
t.Fatalf("Incorrect Unmarshalling for eid: %v != %v", x.vtep, p.vtep)
}
}

View File

@@ -151,13 +151,6 @@ func (d *driver) NetworkFree(id string) error {
return d.call("FreeNetwork", fr, &api.FreeNetworkResponse{})
}
func (d *driver) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
}
func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
return "", nil
}
func (d *driver) CreateNetwork(id string, options map[string]interface{}, nInfo driverapi.NetworkInfo, ipV4Data, ipV6Data []driverapi.IPAMData) error {
create := &api.CreateNetworkRequest{
NetworkID: id,

View File

@@ -3,11 +3,10 @@ package overlay
import (
"context"
"fmt"
"net"
"github.com/containerd/log"
"github.com/docker/docker/libnetwork/driverapi"
"github.com/docker/docker/libnetwork/types"
"github.com/docker/docker/libnetwork/drivers/overlay"
"github.com/gogo/protobuf/proto"
)
@@ -27,7 +26,7 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,
return fmt.Errorf("could not find endpoint with id %s", eid)
}
buf, err := proto.Marshal(&PeerRecord{
buf, err := proto.Marshal(&overlay.PeerRecord{
EndpointIP: ep.addr.String(),
EndpointMAC: ep.mac.String(),
TunnelEndpointIP: n.providerAddress,
@@ -36,7 +35,7 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,
return err
}
if err := jinfo.AddTableEntry(ovPeerTable, eid, buf); err != nil {
if err := jinfo.AddTableEntry(overlay.OverlayPeerTable, eid, buf); err != nil {
log.G(context.TODO()).Errorf("overlay: Failed adding table entry to joininfo: %v", err)
}
@@ -47,57 +46,68 @@ func (d *driver) Join(nid, eid string, sboxKey string, jinfo driverapi.JoinInfo,
return nil
}
func (d *driver) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
if tableName != ovPeerTable {
func (d *driver) EventNotify(nid, tableName, key string, prev, value []byte) {
if tableName != overlay.OverlayPeerTable {
log.G(context.TODO()).Errorf("Unexpected table notification for table %s received", tableName)
return
}
eid := key
var peer PeerRecord
if err := proto.Unmarshal(value, &peer); err != nil {
log.G(context.TODO()).Errorf("Failed to unmarshal peer record: %v", err)
return
}
n := d.network(nid)
if n == nil {
return
}
// Ignore local peers. We already know about them and they
// should not be added to vxlan fdb.
if peer.TunnelEndpointIP == n.providerAddress {
var prevPeer, newPeer *overlay.Peer
if prev != nil {
var err error
prevPeer, err = overlay.UnmarshalPeerRecord(prev)
if err != nil {
log.G(context.TODO()).WithError(err).Error("Failed to unmarshal previous peer record")
} else if prevPeer.TunnelEndpointIP.String() == n.providerAddress {
// Ignore local peers. We don't add them to the VXLAN
// FDB so don't need to remove them.
prevPeer = nil
}
}
if value != nil {
var err error
newPeer, err = overlay.UnmarshalPeerRecord(value)
if err != nil {
log.G(context.TODO()).WithError(err).Error("Failed to unmarshal peer record")
} else if newPeer.TunnelEndpointIP.String() == n.providerAddress {
newPeer = nil
}
}
if prevPeer == nil && newPeer == nil {
// Nothing to do! Either the event was for a local peer,
// or unmarshaling failed.
return
}
if prevPeer != nil && newPeer != nil && *prevPeer == *newPeer {
// The update did not materially change the FDB entry.
return
}
addr, err := types.ParseCIDR(peer.EndpointIP)
if err != nil {
log.G(context.TODO()).Errorf("Invalid peer IP %s received in event notify", peer.EndpointIP)
return
if prevPeer != nil {
if err := d.peerDelete(nid, eid, prevPeer.EndpointIP.Addr().AsSlice(), true); err != nil {
log.G(context.TODO()).WithFields(log.Fields{
"error": err,
"nid": n.id,
"peer": prevPeer,
}).Warn("overlay: failed to delete peer entry")
}
}
mac, err := net.ParseMAC(peer.EndpointMAC)
if err != nil {
log.G(context.TODO()).Errorf("Invalid mac %s received in event notify", peer.EndpointMAC)
return
}
vtep := net.ParseIP(peer.TunnelEndpointIP)
if vtep == nil {
log.G(context.TODO()).Errorf("Invalid VTEP %s received in event notify", peer.TunnelEndpointIP)
return
}
if etype == driverapi.Delete {
d.peerDelete(nid, eid, addr.IP, addr.Mask, mac, vtep, true)
return
}
err = d.peerAdd(nid, eid, addr.IP, addr.Mask, mac, vtep, true)
if err != nil {
log.G(context.TODO()).Errorf("peerAdd failed (%v) for ip %s with mac %s", err, addr.IP.String(), mac.String())
if newPeer != nil {
if err := d.peerAdd(nid, eid, newPeer.EndpointIP.Addr().AsSlice(), newPeer.EndpointMAC.AsSlice(), newPeer.TunnelEndpointIP.AsSlice(), true); err != nil {
log.G(context.TODO()).WithFields(log.Fields{
"error": err,
"nid": n.id,
"peer": newPeer,
}).Warn("overlay: failed to add peer entry")
}
}
}

View File

@@ -12,6 +12,7 @@ import (
"github.com/Microsoft/hcsshim"
"github.com/containerd/log"
"github.com/docker/docker/libnetwork/driverapi"
"github.com/docker/docker/libnetwork/drivers/overlay"
"github.com/docker/docker/libnetwork/netlabel"
"github.com/docker/docker/libnetwork/portmapper"
"github.com/docker/docker/libnetwork/types"
@@ -173,7 +174,7 @@ func (d *driver) CreateNetwork(id string, option map[string]interface{}, nInfo d
n.interfaceName = interfaceName
if nInfo != nil {
if err := nInfo.TableEventRegister(ovPeerTable, driverapi.EndpointObject); err != nil {
if err := nInfo.TableEventRegister(overlay.OverlayPeerTable, driverapi.EndpointObject); err != nil {
return err
}
}

View File

@@ -1,455 +0,0 @@
// Code generated by protoc-gen-gogo. DO NOT EDIT.
// source: drivers/windows/overlay/overlay.proto
/*
Package overlay is a generated protocol buffer package.
It is generated from these files:
drivers/windows/overlay/overlay.proto
It has these top-level messages:
PeerRecord
*/
package overlay
import proto "github.com/gogo/protobuf/proto"
import fmt "fmt"
import math "math"
import _ "github.com/gogo/protobuf/gogoproto"
import strings "strings"
import reflect "reflect"
import io "io"
// Reference imports to suppress errors if they are not otherwise used.
var _ = proto.Marshal
var _ = fmt.Errorf
var _ = math.Inf
// This is a compile-time assertion to ensure that this generated file
// is compatible with the proto package it is being compiled against.
// A compilation error at this line likely means your copy of the
// proto package needs to be updated.
const _ = proto.GoGoProtoPackageIsVersion2 // please upgrade the proto package
// PeerRecord defines the information corresponding to a peer
// container in the overlay network.
type PeerRecord struct {
// Endpoint IP is the IP of the container attachment on the
// given overlay network.
EndpointIP string `protobuf:"bytes,1,opt,name=endpoint_ip,json=endpointIp,proto3" json:"endpoint_ip,omitempty"`
// Endpoint MAC is the mac address of the container attachment
// on the given overlay network.
EndpointMAC string `protobuf:"bytes,2,opt,name=endpoint_mac,json=endpointMac,proto3" json:"endpoint_mac,omitempty"`
// Tunnel Endpoint IP defines the host IP for the host in
// which this container is running and can be reached by
// building a tunnel to that host IP.
TunnelEndpointIP string `protobuf:"bytes,3,opt,name=tunnel_endpoint_ip,json=tunnelEndpointIp,proto3" json:"tunnel_endpoint_ip,omitempty"`
}
func (m *PeerRecord) Reset() { *m = PeerRecord{} }
func (*PeerRecord) ProtoMessage() {}
func (*PeerRecord) Descriptor() ([]byte, []int) { return fileDescriptorOverlay, []int{0} }
func (m *PeerRecord) GetEndpointIP() string {
if m != nil {
return m.EndpointIP
}
return ""
}
func (m *PeerRecord) GetEndpointMAC() string {
if m != nil {
return m.EndpointMAC
}
return ""
}
func (m *PeerRecord) GetTunnelEndpointIP() string {
if m != nil {
return m.TunnelEndpointIP
}
return ""
}
func init() {
proto.RegisterType((*PeerRecord)(nil), "overlay.PeerRecord")
}
func (this *PeerRecord) GoString() string {
if this == nil {
return "nil"
}
s := make([]string, 0, 7)
s = append(s, "&overlay.PeerRecord{")
s = append(s, "EndpointIP: "+fmt.Sprintf("%#v", this.EndpointIP)+",\n")
s = append(s, "EndpointMAC: "+fmt.Sprintf("%#v", this.EndpointMAC)+",\n")
s = append(s, "TunnelEndpointIP: "+fmt.Sprintf("%#v", this.TunnelEndpointIP)+",\n")
s = append(s, "}")
return strings.Join(s, "")
}
func valueToGoStringOverlay(v interface{}, typ string) string {
rv := reflect.ValueOf(v)
if rv.IsNil() {
return "nil"
}
pv := reflect.Indirect(rv).Interface()
return fmt.Sprintf("func(v %v) *%v { return &v } ( %#v )", typ, typ, pv)
}
func (m *PeerRecord) Marshal() (dAtA []byte, err error) {
size := m.Size()
dAtA = make([]byte, size)
n, err := m.MarshalTo(dAtA)
if err != nil {
return nil, err
}
return dAtA[:n], nil
}
func (m *PeerRecord) MarshalTo(dAtA []byte) (int, error) {
var i int
_ = i
var l int
_ = l
if len(m.EndpointIP) > 0 {
dAtA[i] = 0xa
i++
i = encodeVarintOverlay(dAtA, i, uint64(len(m.EndpointIP)))
i += copy(dAtA[i:], m.EndpointIP)
}
if len(m.EndpointMAC) > 0 {
dAtA[i] = 0x12
i++
i = encodeVarintOverlay(dAtA, i, uint64(len(m.EndpointMAC)))
i += copy(dAtA[i:], m.EndpointMAC)
}
if len(m.TunnelEndpointIP) > 0 {
dAtA[i] = 0x1a
i++
i = encodeVarintOverlay(dAtA, i, uint64(len(m.TunnelEndpointIP)))
i += copy(dAtA[i:], m.TunnelEndpointIP)
}
return i, nil
}
func encodeVarintOverlay(dAtA []byte, offset int, v uint64) int {
for v >= 1<<7 {
dAtA[offset] = uint8(v&0x7f | 0x80)
v >>= 7
offset++
}
dAtA[offset] = uint8(v)
return offset + 1
}
func (m *PeerRecord) Size() (n int) {
var l int
_ = l
l = len(m.EndpointIP)
if l > 0 {
n += 1 + l + sovOverlay(uint64(l))
}
l = len(m.EndpointMAC)
if l > 0 {
n += 1 + l + sovOverlay(uint64(l))
}
l = len(m.TunnelEndpointIP)
if l > 0 {
n += 1 + l + sovOverlay(uint64(l))
}
return n
}
func sovOverlay(x uint64) (n int) {
for {
n++
x >>= 7
if x == 0 {
break
}
}
return n
}
func sozOverlay(x uint64) (n int) {
return sovOverlay(uint64((x << 1) ^ uint64((int64(x) >> 63))))
}
func (this *PeerRecord) String() string {
if this == nil {
return "nil"
}
s := strings.Join([]string{`&PeerRecord{`,
`EndpointIP:` + fmt.Sprintf("%v", this.EndpointIP) + `,`,
`EndpointMAC:` + fmt.Sprintf("%v", this.EndpointMAC) + `,`,
`TunnelEndpointIP:` + fmt.Sprintf("%v", this.TunnelEndpointIP) + `,`,
`}`,
}, "")
return s
}
func valueToStringOverlay(v interface{}) string {
rv := reflect.ValueOf(v)
if rv.IsNil() {
return "nil"
}
pv := reflect.Indirect(rv).Interface()
return fmt.Sprintf("*%v", pv)
}
func (m *PeerRecord) Unmarshal(dAtA []byte) error {
l := len(dAtA)
iNdEx := 0
for iNdEx < l {
preIndex := iNdEx
var wire uint64
for shift := uint(0); ; shift += 7 {
if shift >= 64 {
return ErrIntOverflowOverlay
}
if iNdEx >= l {
return io.ErrUnexpectedEOF
}
b := dAtA[iNdEx]
iNdEx++
wire |= (uint64(b) & 0x7F) << shift
if b < 0x80 {
break
}
}
fieldNum := int32(wire >> 3)
wireType := int(wire & 0x7)
if wireType == 4 {
return fmt.Errorf("proto: PeerRecord: wiretype end group for non-group")
}
if fieldNum <= 0 {
return fmt.Errorf("proto: PeerRecord: illegal tag %d (wire type %d)", fieldNum, wire)
}
switch fieldNum {
case 1:
if wireType != 2 {
return fmt.Errorf("proto: wrong wireType = %d for field EndpointIP", wireType)
}
var stringLen uint64
for shift := uint(0); ; shift += 7 {
if shift >= 64 {
return ErrIntOverflowOverlay
}
if iNdEx >= l {
return io.ErrUnexpectedEOF
}
b := dAtA[iNdEx]
iNdEx++
stringLen |= (uint64(b) & 0x7F) << shift
if b < 0x80 {
break
}
}
intStringLen := int(stringLen)
if intStringLen < 0 {
return ErrInvalidLengthOverlay
}
postIndex := iNdEx + intStringLen
if postIndex > l {
return io.ErrUnexpectedEOF
}
m.EndpointIP = string(dAtA[iNdEx:postIndex])
iNdEx = postIndex
case 2:
if wireType != 2 {
return fmt.Errorf("proto: wrong wireType = %d for field EndpointMAC", wireType)
}
var stringLen uint64
for shift := uint(0); ; shift += 7 {
if shift >= 64 {
return ErrIntOverflowOverlay
}
if iNdEx >= l {
return io.ErrUnexpectedEOF
}
b := dAtA[iNdEx]
iNdEx++
stringLen |= (uint64(b) & 0x7F) << shift
if b < 0x80 {
break
}
}
intStringLen := int(stringLen)
if intStringLen < 0 {
return ErrInvalidLengthOverlay
}
postIndex := iNdEx + intStringLen
if postIndex > l {
return io.ErrUnexpectedEOF
}
m.EndpointMAC = string(dAtA[iNdEx:postIndex])
iNdEx = postIndex
case 3:
if wireType != 2 {
return fmt.Errorf("proto: wrong wireType = %d for field TunnelEndpointIP", wireType)
}
var stringLen uint64
for shift := uint(0); ; shift += 7 {
if shift >= 64 {
return ErrIntOverflowOverlay
}
if iNdEx >= l {
return io.ErrUnexpectedEOF
}
b := dAtA[iNdEx]
iNdEx++
stringLen |= (uint64(b) & 0x7F) << shift
if b < 0x80 {
break
}
}
intStringLen := int(stringLen)
if intStringLen < 0 {
return ErrInvalidLengthOverlay
}
postIndex := iNdEx + intStringLen
if postIndex > l {
return io.ErrUnexpectedEOF
}
m.TunnelEndpointIP = string(dAtA[iNdEx:postIndex])
iNdEx = postIndex
default:
iNdEx = preIndex
skippy, err := skipOverlay(dAtA[iNdEx:])
if err != nil {
return err
}
if skippy < 0 {
return ErrInvalidLengthOverlay
}
if (iNdEx + skippy) > l {
return io.ErrUnexpectedEOF
}
iNdEx += skippy
}
}
if iNdEx > l {
return io.ErrUnexpectedEOF
}
return nil
}
func skipOverlay(dAtA []byte) (n int, err error) {
l := len(dAtA)
iNdEx := 0
for iNdEx < l {
var wire uint64
for shift := uint(0); ; shift += 7 {
if shift >= 64 {
return 0, ErrIntOverflowOverlay
}
if iNdEx >= l {
return 0, io.ErrUnexpectedEOF
}
b := dAtA[iNdEx]
iNdEx++
wire |= (uint64(b) & 0x7F) << shift
if b < 0x80 {
break
}
}
wireType := int(wire & 0x7)
switch wireType {
case 0:
for shift := uint(0); ; shift += 7 {
if shift >= 64 {
return 0, ErrIntOverflowOverlay
}
if iNdEx >= l {
return 0, io.ErrUnexpectedEOF
}
iNdEx++
if dAtA[iNdEx-1] < 0x80 {
break
}
}
return iNdEx, nil
case 1:
iNdEx += 8
return iNdEx, nil
case 2:
var length int
for shift := uint(0); ; shift += 7 {
if shift >= 64 {
return 0, ErrIntOverflowOverlay
}
if iNdEx >= l {
return 0, io.ErrUnexpectedEOF
}
b := dAtA[iNdEx]
iNdEx++
length |= (int(b) & 0x7F) << shift
if b < 0x80 {
break
}
}
iNdEx += length
if length < 0 {
return 0, ErrInvalidLengthOverlay
}
return iNdEx, nil
case 3:
for {
var innerWire uint64
var start int = iNdEx
for shift := uint(0); ; shift += 7 {
if shift >= 64 {
return 0, ErrIntOverflowOverlay
}
if iNdEx >= l {
return 0, io.ErrUnexpectedEOF
}
b := dAtA[iNdEx]
iNdEx++
innerWire |= (uint64(b) & 0x7F) << shift
if b < 0x80 {
break
}
}
innerWireType := int(innerWire & 0x7)
if innerWireType == 4 {
break
}
next, err := skipOverlay(dAtA[start:])
if err != nil {
return 0, err
}
iNdEx = start + next
}
return iNdEx, nil
case 4:
return iNdEx, nil
case 5:
iNdEx += 4
return iNdEx, nil
default:
return 0, fmt.Errorf("proto: illegal wireType %d", wireType)
}
}
panic("unreachable")
}
var (
ErrInvalidLengthOverlay = fmt.Errorf("proto: negative length found during unmarshaling")
ErrIntOverflowOverlay = fmt.Errorf("proto: integer overflow")
)
func init() { proto.RegisterFile("drivers/windows/overlay/overlay.proto", fileDescriptorOverlay) }
var fileDescriptorOverlay = []byte{
// 220 bytes of a gzipped FileDescriptorProto
0x1f, 0x8b, 0x08, 0x00, 0x00, 0x00, 0x00, 0x00, 0x02, 0xff, 0xe2, 0x52, 0x4d, 0x29, 0xca, 0x2c,
0x4b, 0x2d, 0x2a, 0xd6, 0x2f, 0xcf, 0xcc, 0x4b, 0xc9, 0x2f, 0x2f, 0xd6, 0xcf, 0x2f, 0x4b, 0x2d,
0xca, 0x49, 0xac, 0x84, 0xd1, 0x7a, 0x05, 0x45, 0xf9, 0x25, 0xf9, 0x42, 0xec, 0x50, 0xae, 0x94,
0x48, 0x7a, 0x7e, 0x7a, 0x3e, 0x58, 0x4c, 0x1f, 0xc4, 0x82, 0x48, 0x2b, 0x6d, 0x65, 0xe4, 0xe2,
0x0a, 0x48, 0x4d, 0x2d, 0x0a, 0x4a, 0x4d, 0xce, 0x2f, 0x4a, 0x11, 0xd2, 0xe7, 0xe2, 0x4e, 0xcd,
0x4b, 0x29, 0xc8, 0xcf, 0xcc, 0x2b, 0x89, 0xcf, 0x2c, 0x90, 0x60, 0x54, 0x60, 0xd4, 0xe0, 0x74,
0xe2, 0x7b, 0x74, 0x4f, 0x9e, 0xcb, 0x15, 0x2a, 0xec, 0x19, 0x10, 0xc4, 0x05, 0x53, 0xe2, 0x59,
0x20, 0x64, 0xc4, 0xc5, 0x03, 0xd7, 0x90, 0x9b, 0x98, 0x2c, 0xc1, 0x04, 0xd6, 0xc1, 0xff, 0xe8,
0x9e, 0x3c, 0x37, 0x4c, 0x87, 0xaf, 0xa3, 0x73, 0x10, 0xdc, 0x54, 0xdf, 0xc4, 0x64, 0x21, 0x27,
0x2e, 0xa1, 0x92, 0xd2, 0xbc, 0xbc, 0xd4, 0x9c, 0x78, 0x64, 0xbb, 0x98, 0xc1, 0x3a, 0x45, 0x1e,
0xdd, 0x93, 0x17, 0x08, 0x01, 0xcb, 0x22, 0xd9, 0x28, 0x50, 0x82, 0x2a, 0x52, 0xe0, 0x24, 0x71,
0xe3, 0xa1, 0x1c, 0xc3, 0x87, 0x87, 0x72, 0x8c, 0x0d, 0x8f, 0xe4, 0x18, 0x4f, 0x3c, 0x92, 0x63,
0xbc, 0xf0, 0x48, 0x8e, 0xf1, 0xc1, 0x23, 0x39, 0xc6, 0x24, 0x36, 0xb0, 0xc7, 0x8c, 0x01, 0x01,
0x00, 0x00, 0xff, 0xff, 0xc0, 0x48, 0xd1, 0xc0, 0x20, 0x01, 0x00, 0x00,
}

View File

@@ -1,27 +0,0 @@
syntax = "proto3";
import "github.com/gogo/protobuf/gogoproto/gogo.proto";
package overlay;
option (gogoproto.marshaler_all) = true;
option (gogoproto.unmarshaler_all) = true;
option (gogoproto.stringer_all) = true;
option (gogoproto.gostring_all) = true;
option (gogoproto.sizer_all) = true;
option (gogoproto.goproto_stringer_all) = false;
// PeerRecord defines the information corresponding to a peer
// container in the overlay network.
message PeerRecord {
// Endpoint IP is the IP of the container attachment on the
// given overlay network.
string endpoint_ip = 1 [(gogoproto.customname) = "EndpointIP"];
// Endpoint MAC is the mac address of the container attachment
// on the given overlay network.
string endpoint_mac = 2 [(gogoproto.customname) = "EndpointMAC"];
// Tunnel Endpoint IP defines the host IP for the host in
// which this container is running and can be reached by
// building a tunnel to that host IP.
string tunnel_endpoint_ip = 3 [(gogoproto.customname) = "TunnelEndpointIP"];
}

View File

@@ -1,7 +1,5 @@
package overlay
//go:generate protoc -I=. -I=../../../../vendor/ --gogo_out=import_path=github.com/docker/docker/libnetwork/drivers/overlay:. overlay.proto
import (
"context"
"encoding/json"
@@ -18,6 +16,8 @@ const (
NetworkType = "overlay"
)
var _ driverapi.TableWatcher = (*driver)(nil)
type driver struct {
networks networkTable
sync.Mutex

View File

@@ -11,9 +11,7 @@ import (
"github.com/docker/docker/libnetwork/types"
)
const ovPeerTable = "overlay_peer_table"
func (d *driver) peerAdd(nid, eid string, peerIP net.IP, peerIPMask net.IPMask, peerMac net.HardwareAddr, vtep net.IP, updateDb bool) error {
func (d *driver) peerAdd(nid, eid string, peerIP net.IP, peerMac net.HardwareAddr, vtep net.IP, updateDb bool) error {
log.G(context.TODO()).Debugf("WINOVERLAY: Enter peerAdd for ca ip %s with ca mac %s", peerIP.String(), peerMac.String())
if err := validateID(nid, eid); err != nil {
@@ -83,7 +81,7 @@ func (d *driver) peerAdd(nid, eid string, peerIP net.IP, peerIPMask net.IPMask,
return nil
}
func (d *driver) peerDelete(nid, eid string, peerIP net.IP, peerIPMask net.IPMask, peerMac net.HardwareAddr, vtep net.IP, updateDb bool) error {
func (d *driver) peerDelete(nid, eid string, peerIP net.IP, updateDb bool) error {
log.G(context.TODO()).Infof("WINOVERLAY: Enter peerDelete for endpoint %s and peer ip %s", eid, peerIP.String())
if err := validateID(nid, eid); err != nil {

View File

@@ -236,7 +236,7 @@ func (d *driver) parseNetworkOptions(id string, genericOptions map[string]string
return config, nil
}
func (c *networkConfiguration) processIPAM(id string, ipamV4Data, ipamV6Data []driverapi.IPAMData) error {
func (ncfg *networkConfiguration) processIPAM(id string, ipamV4Data, ipamV6Data []driverapi.IPAMData) error {
if len(ipamV6Data) > 0 {
return types.ForbiddenErrorf("windowsshim driver doesn't support v6 subnets")
}
@@ -248,13 +248,6 @@ func (c *networkConfiguration) processIPAM(id string, ipamV4Data, ipamV6Data []d
return nil
}
func (d *driver) EventNotify(etype driverapi.EventType, nid, tableName, key string, value []byte) {
}
func (d *driver) DecodeTableEntry(tablename string, key string, value []byte) (string, map[string]string) {
return "", nil
}
func (d *driver) createNetwork(config *networkConfiguration) *hnsNetwork {
network := &hnsNetwork{
id: config.ID,

View File

@@ -60,7 +60,7 @@ func (ir *IPAMs) RegisterIpamDriver(name string, driver ipamapi.Ipam) error {
}
// IPAMWalkFunc defines the IPAM driver table walker function signature.
type IPAMWalkFunc func(name string, driver ipamapi.Ipam, cap *ipamapi.Capability) bool
type IPAMWalkFunc func(name string, driver ipamapi.Ipam, capability *ipamapi.Capability) bool
// WalkIPAMs walks the IPAM drivers registered in the registry and invokes the passed walk function and each one of them.
func (ir *IPAMs) WalkIPAMs(ifn IPAMWalkFunc) {

View File

@@ -36,7 +36,7 @@ func TestIPAMs(t *testing.T) {
reg := getNewIPAMs(t)
ipams := make([]string, 0, 2)
reg.WalkIPAMs(func(name string, driver ipamapi.Ipam, cap *ipamapi.Capability) bool {
reg.WalkIPAMs(func(name string, driver ipamapi.Ipam, capability *ipamapi.Capability) bool {
ipams = append(ipams, name)
return false
})

View File

@@ -49,9 +49,9 @@ func TestNetworks(t *testing.T) {
err := reg.RegisterDriver(mockDriverName, &md, mockDriverCaps)
assert.NilError(t, err)
d, cap := reg.Driver(mockDriverName)
assert.Check(t, d != nil)
assert.Check(t, is.DeepEqual(cap, mockDriverCaps))
driver, capability := reg.Driver(mockDriverName)
assert.Check(t, driver != nil)
assert.Check(t, is.DeepEqual(capability, mockDriverCaps))
})
t.Run("WalkDrivers", func(t *testing.T) {

View File

@@ -3,6 +3,7 @@ package libnetwork
import (
"context"
"fmt"
"net"
"github.com/containerd/log"
"github.com/docker/docker/libnetwork/iptables"
@@ -44,11 +45,71 @@ func setupUserChain(ipVersion iptables.IPVersion) error {
if _, err := ipt.NewChain(userChain, iptables.Filter, false); err != nil {
return fmt.Errorf("failed to create %s %v chain: %v", userChain, ipVersion, err)
}
if err := ipt.AddReturnRule(userChain); err != nil {
if err := ipt.AddReturnRule(iptables.Filter, userChain); err != nil {
return fmt.Errorf("failed to add the RETURN rule for %s %v: %w", userChain, ipVersion, err)
}
if err := ipt.EnsureJumpRule("FORWARD", userChain); err != nil {
if err := ipt.EnsureJumpRule(iptables.Filter, "FORWARD", userChain); err != nil {
return fmt.Errorf("failed to ensure the jump rule for %s %v: %w", userChain, ipVersion, err)
}
return nil
}
func (c *Controller) setupPlatformFirewall() {
setupArrangeUserFilterRule(c)
// Add handler for iptables rules restoration in case of a firewalld reload
c.handleFirewalldReload()
}
func (c *Controller) handleFirewalldReload() {
handler := func() {
services := make(map[serviceKey]*service)
c.mu.Lock()
for k, s := range c.serviceBindings {
if k.ports != "" && len(s.ingressPorts) != 0 {
services[k] = s
}
}
c.mu.Unlock()
for _, s := range services {
c.handleFirewallReloadService(s)
}
}
// Add handler for iptables rules restoration in case of a firewalld reload
iptables.OnReloaded(handler)
}
func (c *Controller) handleFirewallReloadService(s *service) {
s.Lock()
defer s.Unlock()
if s.deleted {
log.G(context.TODO()).Debugf("handleFirewallReloadService called for deleted service %s/%s", s.id, s.name)
return
}
for nid := range s.loadBalancers {
n, err := c.NetworkByID(nid)
if err != nil {
continue
}
ep, sb, err := n.findLBEndpointSandbox()
if err != nil {
log.G(context.TODO()).Warnf("handleFirewallReloadService failed to find LB Endpoint Sandbox for %s/%s: %v -- ", n.ID(), n.Name(), err)
continue
}
if sb.osSbox == nil {
return
}
if ep != nil {
var gwIP net.IP
if gwEP := sb.getGatewayEndpoint(); gwEP != nil {
gwIP = gwEP.Iface().Address().IP
}
if err := restoreIngressPorts(gwIP, s.ingressPorts); err != nil {
log.G(context.TODO()).Errorf("Failed to add ingress: %v", err)
return
}
}
}
}

View File

@@ -5,3 +5,5 @@ package libnetwork
func setupArrangeUserFilterRule(c *Controller) {}
func arrangeUserFilterRule() {}
func setupUserChain(ipVersion any) error { return nil }
func (c *Controller) setupPlatformFirewall() {}

View File

@@ -0,0 +1,19 @@
// FIXME(thaJeztah): remove once we are a module; the go:build directive prevents go from downgrading language version to go1.16:
//go:build go1.23
package countmap
// Map is a map of counters.
type Map[T comparable] map[T]int
// Add adds delta to the counter for v and returns the new value.
//
// If the new value is 0, the entry is removed from the map.
func (m Map[T]) Add(v T, delta int) int {
m[v] += delta
c := m[v]
if c == 0 {
delete(m, v)
}
return c
}

View File

@@ -0,0 +1,27 @@
package countmap_test
import (
"testing"
"github.com/docker/docker/libnetwork/internal/countmap"
"gotest.tools/v3/assert"
is "gotest.tools/v3/assert/cmp"
)
func TestMap(t *testing.T) {
m := countmap.Map[string]{}
m["foo"] = 7
m["bar"] = 2
m["zeroed"] = -2
m.Add("bar", -3)
m.Add("foo", -8)
m.Add("baz", 1)
m.Add("zeroed", 2)
assert.Check(t, is.DeepEqual(m, countmap.Map[string]{"foo": -1, "bar": -1, "baz": 1}))
m.Add("foo", 1)
m.Add("bar", 1)
m.Add("baz", -1)
assert.Check(t, is.DeepEqual(m, countmap.Map[string]{}))
}

View File

@@ -0,0 +1,82 @@
// FIXME(thaJeztah): remove once we are a module; the go:build directive prevents go from downgrading language version to go1.16:
//go:build go1.23
// Package hashable provides handy utility types for making unhashable values
// hashable.
package hashable
import (
"net"
"net/netip"
)
// MACAddr is a hashable encoding of a MAC address.
type MACAddr uint64
// MACAddrFromSlice parses the 6-byte slice as a MAC-48 address.
// Note that a [net.HardwareAddr] can be passed directly as the []byte argument.
// If slice's length is not 6, MACAddrFromSlice returns 0, false.
func MACAddrFromSlice(slice net.HardwareAddr) (MACAddr, bool) {
if len(slice) != 6 {
return 0, false
}
return MACAddrFrom6([6]byte(slice)), true
}
// MACAddrFrom6 returns the address of the MAC-48 address
// given by the bytes in addr.
func MACAddrFrom6(addr [6]byte) MACAddr {
return MACAddr(addr[0])<<40 | MACAddr(addr[1])<<32 | MACAddr(addr[2])<<24 |
MACAddr(addr[3])<<16 | MACAddr(addr[4])<<8 | MACAddr(addr[5])
}
// ParseMAC parses s as an IEEE 802 MAC-48 address using one of the formats
// accepted by [net.ParseMAC].
func ParseMAC(s string) (MACAddr, error) {
hw, err := net.ParseMAC(s)
if err != nil {
return 0, err
}
mac, ok := MACAddrFromSlice(hw)
if !ok {
return 0, &net.AddrError{Err: "not a MAC-48 address", Addr: s}
}
return mac, nil
}
// AsSlice returns a MAC address in its 6-byte representation.
func (p MACAddr) AsSlice() []byte {
mac := [6]byte{
byte(p >> 40), byte(p >> 32), byte(p >> 24),
byte(p >> 16), byte(p >> 8), byte(p),
}
return mac[:]
}
// String returns net.HardwareAddr(p.AsSlice()).String().
func (p MACAddr) String() string {
return net.HardwareAddr(p.AsSlice()).String()
}
// IPMAC is a hashable tuple of an IP address and a MAC address suitable for use as a map key.
type IPMAC struct {
ip netip.Addr
mac MACAddr
}
// IPMACFrom returns an [IPMAC] with the provided IP and MAC addresses.
func IPMACFrom(ip netip.Addr, mac MACAddr) IPMAC {
return IPMAC{ip: ip, mac: mac}
}
func (i IPMAC) String() string {
return i.ip.String() + " " + i.mac.String()
}
func (i IPMAC) IP() netip.Addr {
return i.ip
}
func (i IPMAC) MAC() MACAddr {
return i.mac
}

View File

@@ -0,0 +1,66 @@
package hashable
import (
"net"
"net/netip"
"testing"
"gotest.tools/v3/assert"
is "gotest.tools/v3/assert/cmp"
)
// Assert that the types are hashable.
var (
_ map[MACAddr]bool
_ map[IPMAC]bool
)
func TestMACAddrFrom6(t *testing.T) {
want := [6]byte{0x01, 0x02, 0x03, 0x04, 0x05, 0x06}
assert.DeepEqual(t, MACAddrFrom6(want).AsSlice(), want[:])
}
func TestMACAddrFromSlice(t *testing.T) {
mac, ok := MACAddrFromSlice(net.HardwareAddr{0x01, 0x02, 0x03, 0x04, 0x05, 0x06})
assert.Check(t, ok)
assert.Check(t, is.DeepEqual(mac.AsSlice(), []byte{0x01, 0x02, 0x03, 0x04, 0x05, 0x06}))
// Invalid length
for _, tc := range [][]byte{
{0x01, 0x02, 0x03, 0x04, 0x05},
{0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07},
{},
nil,
} {
mac, ok = MACAddrFromSlice(net.HardwareAddr(tc))
assert.Check(t, !ok, "want MACAddrFromSlice(%#v) ok=false, got true", tc)
assert.Check(t, is.DeepEqual(mac.AsSlice(), []byte{0, 0, 0, 0, 0, 0}), "want MACAddrFromSlice(%#v) = %#v, got %#v", tc, []byte{0, 0, 0, 0, 0, 0}, mac.AsSlice())
}
}
func TestParseMAC(t *testing.T) {
mac, err := ParseMAC("01:02:03:04:05:06")
assert.Check(t, err)
assert.Check(t, is.DeepEqual(mac.AsSlice(), []byte{0x01, 0x02, 0x03, 0x04, 0x05, 0x06}))
// Invalid MAC address
_, err = ParseMAC("01:02:03:04:05:06:07:08")
assert.Check(t, is.ErrorContains(err, "not a MAC-48 address"))
}
func TestMACAddr_String(t *testing.T) {
mac := MACAddrFrom6([6]byte{0x01, 0x02, 0x03, 0x04, 0x05, 0x06})
assert.Check(t, is.Equal(mac.String(), "01:02:03:04:05:06"))
assert.Check(t, is.Equal(MACAddr(0).String(), "00:00:00:00:00:00"))
}
func TestIPMACFrom(t *testing.T) {
assert.Check(t, is.Equal(IPMACFrom(netip.Addr{}, 0), IPMAC{}))
ipm := IPMACFrom(
netip.MustParseAddr("11.22.33.44"),
MACAddrFrom6([6]byte{0x01, 0x02, 0x03, 0x04, 0x05, 0x06}),
)
assert.Check(t, is.Equal(ipm.IP(), netip.MustParseAddr("11.22.33.44")))
assert.Check(t, is.Equal(ipm.MAC(), MACAddrFrom6([6]byte{0x01, 0x02, 0x03, 0x04, 0x05, 0x06})))
}

View File

@@ -13,14 +13,14 @@ import (
// The zero value is an empty set matrix ready to use.
//
// SetMatrix values are safe for concurrent use.
type SetMatrix[T comparable] struct {
matrix map[string]mapset.Set[T]
type SetMatrix[K, V comparable] struct {
matrix map[K]mapset.Set[V]
mu sync.Mutex
}
// Get returns the members of the set for a specific key as a slice.
func (s *SetMatrix[T]) Get(key string) ([]T, bool) {
func (s *SetMatrix[K, V]) Get(key K) ([]V, bool) {
s.mu.Lock()
defer s.mu.Unlock()
set, ok := s.matrix[key]
@@ -31,7 +31,7 @@ func (s *SetMatrix[T]) Get(key string) ([]T, bool) {
}
// Contains is used to verify if an element is in a set for a specific key.
func (s *SetMatrix[T]) Contains(key string, value T) (containsElement, setExists bool) {
func (s *SetMatrix[K, V]) Contains(key K, value V) (containsElement, setExists bool) {
s.mu.Lock()
defer s.mu.Unlock()
set, ok := s.matrix[key]
@@ -43,13 +43,13 @@ func (s *SetMatrix[T]) Contains(key string, value T) (containsElement, setExists
// Insert inserts the value in the set of a key and returns whether the value is
// inserted (was not already in the set) and the number of elements in the set.
func (s *SetMatrix[T]) Insert(key string, value T) (inserted bool, cardinality int) {
func (s *SetMatrix[K, V]) Insert(key K, value V) (inserted bool, cardinality int) {
s.mu.Lock()
defer s.mu.Unlock()
set, ok := s.matrix[key]
if !ok {
if s.matrix == nil {
s.matrix = make(map[string]mapset.Set[T])
s.matrix = make(map[K]mapset.Set[V])
}
s.matrix[key] = mapset.NewThreadUnsafeSet(value)
return true, 1
@@ -59,7 +59,7 @@ func (s *SetMatrix[T]) Insert(key string, value T) (inserted bool, cardinality i
}
// Remove removes the value in the set for a specific key.
func (s *SetMatrix[T]) Remove(key string, value T) (removed bool, cardinality int) {
func (s *SetMatrix[K, V]) Remove(key K, value V) (removed bool, cardinality int) {
s.mu.Lock()
defer s.mu.Unlock()
set, ok := s.matrix[key]
@@ -80,7 +80,7 @@ func (s *SetMatrix[T]) Remove(key string, value T) (removed bool, cardinality in
}
// Cardinality returns the number of elements in the set for a key.
func (s *SetMatrix[T]) Cardinality(key string) (cardinality int, ok bool) {
func (s *SetMatrix[K, V]) Cardinality(key K) (cardinality int, ok bool) {
s.mu.Lock()
defer s.mu.Unlock()
set, ok := s.matrix[key]
@@ -93,7 +93,7 @@ func (s *SetMatrix[T]) Cardinality(key string) (cardinality int, ok bool) {
// String returns the string version of the set.
// The empty string is returned if there is no set for key.
func (s *SetMatrix[T]) String(key string) (v string, ok bool) {
func (s *SetMatrix[K, V]) String(key K) (v string, ok bool) {
s.mu.Lock()
defer s.mu.Unlock()
set, ok := s.matrix[key]
@@ -104,10 +104,10 @@ func (s *SetMatrix[T]) String(key string) (v string, ok bool) {
}
// Keys returns all the keys in the map.
func (s *SetMatrix[T]) Keys() []string {
func (s *SetMatrix[K, V]) Keys() []K {
s.mu.Lock()
defer s.mu.Unlock()
keys := make([]string, 0, len(s.matrix))
keys := make([]K, 0, len(s.matrix))
for k := range s.matrix {
keys = append(keys, k)
}

View File

@@ -9,7 +9,7 @@ import (
)
func TestSetSerialInsertDelete(t *testing.T) {
var s SetMatrix[string]
var s SetMatrix[string, string]
b, i := s.Insert("a", "1")
if !b || i != 1 {
@@ -135,7 +135,7 @@ func TestSetSerialInsertDelete(t *testing.T) {
}
}
func insertDeleteRotuine(ctx context.Context, endCh chan int, s *SetMatrix[string], key, value string) {
func insertDeleteRotuine(ctx context.Context, endCh chan int, s *SetMatrix[string, string], key, value string) {
for {
select {
case <-ctx.Done():
@@ -158,7 +158,7 @@ func insertDeleteRotuine(ctx context.Context, endCh chan int, s *SetMatrix[strin
}
func TestSetParallelInsertDelete(t *testing.T) {
var s SetMatrix[string]
var s SetMatrix[string, string]
parallelRoutines := 6
endCh := make(chan int)
// Let the routines running and competing for 10s

View File

@@ -43,6 +43,13 @@ var (
onReloaded []*func() // callbacks when Firewalld has been reloaded
)
// UsingFirewalld returns true if iptables rules will be applied via firewalld's
// passthrough interface.
func UsingFirewalld() bool {
_ = initCheck()
return firewalldRunning
}
// firewalldInit initializes firewalld management code.
func firewalldInit() error {
var err error
@@ -132,6 +139,7 @@ func reloaded() {
for _, pf := range onReloaded {
(*pf)()
}
log.G(context.TODO()).Info("Firewalld reload completed")
}
// OnReloaded add callback

View File

@@ -45,6 +45,14 @@ func TestReloaded(t *testing.T) {
}
defer fwdChain.Remove()
// This jump from the FORWARD chain prevents FWD from being deleted by
// "iptables -X", called from fwdChain.Remove().
err = iptable.EnsureJumpRule("filter", "FORWARD", "FWD")
if err != nil {
t.Fatal(err)
}
defer iptable.Raw("-D", "FORWARD", "-j", "FWD")
// copy-pasted from iptables_test:TestLink
ip1 := net.ParseIP("192.168.1.1")
ip2 := net.ParseIP("192.168.1.2")
@@ -87,17 +95,30 @@ func TestReloaded(t *testing.T) {
func TestPassthrough(t *testing.T) {
skipIfNoFirewalld(t)
rule1 := []string{
"-A", "INPUT",
"-i", "lo",
"-p", "udp",
"--dport", "123",
"-j", "ACCEPT",
}
_, err := Passthrough(Iptables, append([]string{"-A"}, rule1...)...)
err := firewalldInit()
if err != nil {
t.Fatal(err)
}
if !GetIptable(IPv4).Exists(Filter, "INPUT", rule1...) {
_, err = Passthrough(Iptables, rule1...)
if err != nil {
t.Fatal(err)
}
if !GetIptable(IPv4).Exists(Filter, rule1[1], rule1[2:]...) {
t.Fatal("rule1 does not exist")
}
rule1[0] = "-D"
_, err = Passthrough(Iptables, rule1...)
if err != nil {
t.Fatal(err)
}
if GetIptable(IPv4).Exists(Filter, rule1[1], rule1[2:]...) {
t.Fatal("rule1 still exists")
}
}

View File

@@ -31,6 +31,17 @@ const (
Insert Action = "-I"
)
const (
// ForwardChain is the FORWARD name of the chain used for forwarding packets.
ForwardChain = "FORWARD"
// PreroutingChain is the PREROUTING name of the chain used for packets before routing.
PreroutingChain = "PREROUTING"
// PostroutingChain is the POSTROUTING name of the chain used for packets after routing.
PostroutingChain = "POSTROUTING"
// OutputChain is the OUTPUT name of the chain used for packets after routing.
OutputChain = "OUTPUT"
)
// Policy is the default iptable policies
type Policy string
@@ -589,6 +600,14 @@ func (iptable IPTable) ExistChain(chain string, table Table) bool {
return err == nil
}
// FlushChain flush chain if it exists
func (iptable IPTable) FlushChain(table Table, chain string) error {
if !iptable.ExistChain(chain, table) {
return nil
}
return iptable.RawCombinedOutput("-t", string(table), "-F", chain)
}
// SetDefaultPolicy sets the passed default policy for the table/chain
func (iptable IPTable) SetDefaultPolicy(table Table, chain string, policy Policy) error {
if err := iptable.RawCombinedOutput("-t", string(table), "-P", chain, string(policy)); err != nil {
@@ -598,25 +617,93 @@ func (iptable IPTable) SetDefaultPolicy(table Table, chain string, policy Policy
}
// AddReturnRule adds a return rule for the chain in the filter table
func (iptable IPTable) AddReturnRule(chain string) error {
if iptable.Exists(Filter, chain, "-j", "RETURN") {
func (iptable IPTable) AddReturnRule(table Table, chain string) error {
if iptable.Exists(table, chain, "-j", "RETURN") {
return nil
}
if err := iptable.RawCombinedOutput("-A", chain, "-j", "RETURN"); err != nil {
if err := iptable.RawCombinedOutput("-t", string(table), "-A", chain, "-j", "RETURN"); err != nil {
return fmt.Errorf("unable to add return rule in %s chain: %v", chain, err)
}
return nil
}
// EnsureJumpRule ensures the jump rule is on top
func (iptable IPTable) EnsureJumpRule(fromChain, toChain string) error {
if iptable.Exists(Filter, fromChain, "-j", toChain) {
if err := iptable.RawCombinedOutput("-D", fromChain, "-j", toChain); err != nil {
return fmt.Errorf("unable to remove jump to %s rule in %s chain: %v", toChain, fromChain, err)
}
func (iptable IPTable) EnsureJumpRule(table Table, fromChain, toChain string, rule ...string) error {
if err := iptable.DeleteJumpRule(table, fromChain, toChain, rule...); err != nil {
return err
}
if err := iptable.RawCombinedOutput("-I", fromChain, "-j", toChain); err != nil {
rule = append(rule, "-j", toChain)
if err := iptable.RawCombinedOutput(append([]string{"-t", string(table), "-I", fromChain}, rule...)...); err != nil {
return fmt.Errorf("unable to insert jump to %s rule in %s chain: %v", toChain, fromChain, err)
}
return nil
}
// DeleteJumpRule deletes a rule added by EnsureJumpRule. It's a no-op if the rule
// doesn't exist.
func (iptable IPTable) DeleteJumpRule(table Table, fromChain, toChain string, rule ...string) error {
rule = append(rule, "-j", toChain)
if iptable.Exists(table, fromChain, rule...) {
if err := iptable.RawCombinedOutput(append([]string{"-t", string(table), "-D", fromChain}, rule...)...); err != nil {
return fmt.Errorf("unable to remove jump to %s rule in %s chain: %v", toChain, fromChain, err)
}
}
return nil
}
type Rule struct {
IPVer IPVersion
Table Table
Chain string
Args []string
}
// Exists returns true if the rule exists in the kernel.
func (r Rule) Exists() bool {
return GetIptable(r.IPVer).Exists(r.Table, r.Chain, r.Args...)
}
func (r Rule) cmdArgs(op Action) []string {
return append([]string{"-t", string(r.Table), string(op), r.Chain}, r.Args...)
}
func (r Rule) exec(op Action) error {
return GetIptable(r.IPVer).RawCombinedOutput(r.cmdArgs(op)...)
}
// ensure appends/insert the rule to the end of the chain. If the rule already exists anywhere in the
// chain, this is a no-op.
func (r Rule) ensure(op Action) error {
if r.Exists() {
return nil
}
return r.exec(op)
}
// Append appends the rule to the end of the chain. If the rule already exists anywhere in the
// chain, this is a no-op.
func (r Rule) Append() error {
return r.ensure(Append)
}
// Insert inserts the rule at the head of the chain. If the rule already exists anywhere in the
// chain, this is a no-op.
func (r Rule) Insert() error {
return r.ensure(Insert)
}
// Delete deletes the rule from the kernel. If the rule does not exist, this is a no-op.
func (r Rule) Delete() error {
if !r.Exists() {
return nil
}
return r.exec(Delete)
}
func (r Rule) String() string {
cmd := append([]string{"iptables"}, r.cmdArgs("-A")...)
if r.IPVer == IPv6 {
cmd[0] = "ip6tables"
}
return strings.Join(cmd, " ")
}

View File

@@ -3,13 +3,16 @@
package iptables
import (
"fmt"
"net"
"os/exec"
"strconv"
"strings"
"testing"
"github.com/docker/docker/internal/testutils/netnsutils"
"golang.org/x/sync/errgroup"
"gotest.tools/v3/assert"
)
const (
@@ -298,3 +301,37 @@ func TestExistsRaw(t *testing.T) {
}
}
}
func TestFlushChain(t *testing.T) {
if UsingFirewalld() {
t.Skip("firewalld in host netns cannot create rules in the test's netns")
}
defer netnsutils.SetupTestOSContext(t)()
iptable := GetIptable(IPv4)
chain := "TESTFLUSHCHAIN"
table := Filter
// Ensure the chain exists
assert.NilError(t, iptable.RemoveExistingChain(chain, table))
_, err := iptable.NewChain(chain, table, false)
assert.NilError(t, err)
// Add a rule to the chain
rule := Rule{IPVer: IPv4, Table: table, Chain: chain,
Args: []string{"-j", "ACCEPT"}}
assert.NilError(t, rule.Insert())
// Flush the chain
assert.NilError(t, iptable.FlushChain(table, chain))
// Check that the chain exists and is empty (only the chain definition remains)
out, err := exec.Command("iptables", "-t", string(table), "-S", chain).CombinedOutput()
assert.NilError(t, err)
rulesCount := strings.Count(string(out), fmt.Sprintf("-A %s ", chain))
assert.Check(t, rulesCount == 0)
// Cleanup
_ = iptable.RemoveExistingChain(chain, table)
}

Some files were not shown because too many files have changed in this diff Show More