Starting with commit 0d6e7cd983
DeleteNeighbor() needs to be called with the same options as the
AddNeighbor() call that created the neighbor entry. The calls in peerdb
were modified incorrectly, resulting in the deletes failing and leaking
neighbor entries. Fix up the DeleteNeighbor calls so that the FDB entry
is deleted from the FDB instead of the neighbor table, and the neighbor
is deleted from the neighbor table instead of the FDB.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The (*driver).Join function does many things to set up overlay
networking. One of the first things it does is call
(*network).joinSandbox, which in turn calls (*driver).initSandboxPeerDB.
The initSandboxPeerDB function iterates through the peer db to add
entries to the VXLAN FDB, neighbor table and IPsec security association
database in the kernel for all known peers on the overlay network.
One of the last things the (*driver).Join function does is call
(*driver).initEncryption. The initEncryption function iterates through
the peer db to add entries to the IPsec security association database in
the kernel for all known peers on the overlay network. But the preceding
initSandboxPeerDB call already did that! The initEncryption function is
redundant and can safely be removed.
Signed-off-by: Cory Snider <csnider@mirantis.com>
In addition to being three functions in a trenchcoat, the
checkEncryption function has a very subtle implementation which is
difficult to reason about. That is not a good property for security
relevant code to have.
Replace two of the three calls to checkEncryption with conditional calls
to setupEncryption and removeEncryption, lifting the conditional logic
which was hidden away in checkEncryption into the call sites to make it
easier to reason about the code. Replace the third call with a call to a
new initEncryption function.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The setupEncryption and removeEncryption functions take several
parameters, but all call sites pass the same values for all the
parameters aside from remoteIP: values taken from fields of the driver
struct. Refactor these functions to be methods of the driver struct and
drop the redundant parameters.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Since it is not meaningful to add or remove encryption between the local
node and itself, the isLocal parameter is redundant. Setting up
encryption for all network peers is now invoked by calling
checkEncryption(nid, netip.Addr{}, true)
Calling checkEncryption with isLocal=true, add=false is now more
explicitly a no-op. It always was effectively a no-op, but that was not
easy to spot by inspection. In the world with the isLocal flag,
calls to checkEncryption where isLocal=true and add=false would have rIP
set to d.advertiseAddr. In other words, it was a request to remove
encryption parameters between the local peer and itself if peerDB had no
remote-peer entries for the network. So either the call would do
nothing, or it would remove encryption parameters that aren't used for
anything. Now the equivalent call always does nothing.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Drop the isLocal boolean parameters from the peerDB functions. Local
peers have vtep == netip.Addr{}.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The VTEP value for a peer in peerDB is only accurate for a remote peer.
The VTEP for a local peer would be the driver's advertise address, which
is not necessarily constant for the lifetime of the driver instance.
The VTEP values persisted in the peerDB entries for local peers could be
stale or missing if not kept in sync with the advertise address. And the
peerDB could get polluted with duplicate entries for local peers if the
advertise address was to change, as entries which differ only by VTEP
are considered distinct by SetMatrix. Persisting the advertise address
as the VTEP for local peers creates lots of problems that are not easy
to solve.
Stop persisting the VTEP for local peers in peerDB. Any code that needs
to know the VTEP for local peers can look that up from the source of
truth: the driver's advertise address. Use the lack of a VTEP in peerDB
entries to signify local peers, making the isLocal flag redundant.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The overlay driver's checkEncryption function configures the IPSec
parameters for the VXLAN tunnels to peer nodes. When called with
isLocal=true, it configures encryption for all peer nodes with at least
one peerDB entry. Since the local peers are also included in the peerDB,
it needs to filter those entries out. It does so by filtering out any
peer entries whose VTEP address is equal to the current local advertise
address. Trouble is, the local advertise address is not necessarily
constant. The driver tries to handle this case by calling
peerDBUpdateSelf() when the advertise address changes. This function
iterates through the peerDB and tries to update the VTEP address for all
local peer entries, but it does not actually do anything: it mutates a
temporary copy of the entry which is not persisted back into the peerDB.
(It used to be functional, but was broken when the peerDB was extended
to use SetMatrix.) So there may be cases where local peer entries are
not filtered out properly, resulting in spurious encryption parameters
being programmed into the kernel.
Filter out local peers when walking the peerDB by filtering on whether
the entry has the isLocal flag set. Remove the no-op code which attempts
to update local entries in the peerDB. No other code takes any interest
in the VTEP value for isLocal peer entries.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The netip types are really useful for tracking state in the overlay
driver as they are hashable, unlike net.IP and friends, making them
directly useable as map keys. Converting between netip and net types is
fairly trivial, but fewer conversions is more ergonomic.
The NetworkDB entries for the overlay peer table encode the IP addresses
as strings. We need to parse them to some representation before
processing them further. Parse directly into netip types and pass those
values around to cut down on the number of conversions needed.
The peerDB needs to marshal the keys and entries to structs of hashable
values to be able to insert them into the SetMatrix. Use netip.Addr in
peerEntry so that peerEntry values can be directly inserted into the
SetMatrix without conversions. Use a hashable struct type as the
SetMatrix key to avoid having to marshal the whole struct to a string
and parse it back out.
Use netip.Addr as the map key for the driver's encryption map so the
values do not need to be converted to and from strings. Change the
encryption configuration methods to take netip types so the peerDB code
can pass netip values directly.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The Namespace keeps some state for each inserted neighbor-table entry
which is used to delete the entry (and any related entries) given only
the IP and MAC address of the entry to delete. This state is not
strictly required as the retained data is a pure function of the
parameters passed to AddNeighbor(), and the kernel can inform us whether
an attempt to add a neighbor entry would conflict with an existing
entry. Get rid of the neighbor state in Namespace. It's just one more
piece of state that can cause lots of grief if it falls out of sync with
ground truth. Require callers to call DeleteNeighbor() with the same
aguments as they had passed to AddNeighbor(). Push the responsibility
for detecting attempts to insert conflicting entries into the neighbor
table onto the kernel by using (*netlink.Handle).NeighAdd() instead of
NeighSet().
Modernize the error messages and logging in DeleteNeighbor() and
AddNeighbor().
Signed-off-by: Cory Snider <csnider@mirantis.com>
func (*Namespace) AddNeighbor is only ever called with the force
parameter set to false. Remove the parameter and eliminate dead code.
Signed-off-by: Cory Snider <csnider@mirantis.com>
Go maintainers started to unconditionally update the minimum go version
for golang.org/x/ dependencies to go1.23, which means that we'll no longer
be able to support any version below that when updating those dependencies;
> all: upgrade go directive to at least 1.23.0 [generated]
>
> By now Go 1.24.0 has been released, and Go 1.22 is no longer supported
> per the Go Release Policy (https://go.dev/doc/devel/release#policy).
>
> For golang/go#69095.
This updates our minimum version to go1.23, as we won't be able to maintain
compatibility with older versions because of the above.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Plumb context from the API down to libnet driver method `CreateNetwork`,
and add an OTel span to the bridge driver's `createNetwork` method.
Include a few attributes describing the network configuration (e.g.
IPv4/IPv6, ICC, internal and MTU).
A new util function, `RecordStatus`, is added to the `otelutil` package
to easily record any error, and update the span status accordingly.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
libnetwork/drivers/overlay/encryption.go:370:2: naked return in func `programSA` with 64 lines of code (nakedret)
return
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
To support this, a new netlabel is added: `com.docker.network.endpoint.ifname`.
It gives the ability to specify the interface name to be set by
netdrivers when the interface is added / moved into the container's
network namespace.
All builtin netdrivers support it.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
libnetwork/drivers/overlay/encryption.go:682:3: The copy of the 'for' variable "sp" can be deleted (Go 1.22+) (copyloopvar)
sp := sp
^
libnetwork/drivers/overlay/encryption.go:692:3: The copy of the 'for' variable "sa" can be deleted (Go 1.22+) (copyloopvar)
sa := sa
^
libnetwork/drivers/overlay/peerdb.go:134:3: The copy of the 'for' variable "pEntry" can be deleted (Go 1.22+) (copyloopvar)
pEntry := pEntry
^
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
commit a0807e7cfe configured golangci-lint
to use go1.23 semantics, which alowed linters like `copyloopvar` to lint
using thee correct semantics.
go1.22 now creates a copy of variables when assigned in a loop; make sure we
don't have files that may downgrade semantics to go1.21 in case that also means
disabling that feature; https://go.dev/ref/spec#Go_1.22
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
A recent change to the vishvananda/netlink package exposes
NLM_F_DUMP_INTR in some netlink responses as an EINTR (with
no data).
Retry the requests when that happens, up to five times, before
returning the error. The limit of five is arbitrary, on most
systems a single retry will be rare but, there's no guarantee
that a retry will succeed. So, on a very busy or misbehaving
system the error may still be returned. In most cases, this
will lead to failure of the operation being attempted (which
may lead to daemon startup failure, network initialisation
failure etc).
Signed-off-by: Rob Murray <rob.murray@docker.com>
This repository is not yet a module (i.e., does not have a `go.mod`). This
is not problematic when building the code in GOPATH or "vendor" mode, but
when using the code as a module-dependency (in module-mode), different semantics
are applied since Go1.21, which switches Go _language versions_ on a per-module,
per-package, or even per-file base.
A condensed summary of that logic [is as follows][1]:
- For modules that have a go.mod containing a go version directive; that
version is considered a minimum _required_ version (starting with the
go1.19.13 and go1.20.8 patch releases: before those, it was only a
recommendation).
- For dependencies that don't have a go.mod (not a module), go language
version go1.16 is assumed.
- Likewise, for modules that have a go.mod, but the file does not have a
go version directive, go language version go1.16 is assumed.
- If a go.work file is present, but does not have a go version directive,
language version go1.17 is assumed.
When switching language versions, Go _downgrades_ the language version,
which means that language features (such as generics, and `any`) are not
available, and compilation fails. For example:
# github.com/docker/cli/cli/context/store
/go/pkg/mod/github.com/docker/cli@v25.0.0-beta.2+incompatible/cli/context/store/storeconfig.go:6:24: predeclared any requires go1.18 or later (-lang was set to go1.16; check go.mod)
/go/pkg/mod/github.com/docker/cli@v25.0.0-beta.2+incompatible/cli/context/store/store.go:74:12: predeclared any requires go1.18 or later (-lang was set to go1.16; check go.mod)
Note that these fallbacks are per-module, per-package, and can even be
per-file, so _(indirect) dependencies_ can still use modern language
features, as long as their respective go.mod has a version specified.
Unfortunately, these failures do not occur when building locally (using
vendor / GOPATH mode), but will affect consumers of the module.
Obviously, this situation is not ideal, and the ultimate solution is to
move to go modules (add a go.mod), but this comes with a non-insignificant
risk in other areas (due to our complex dependency tree).
We can revert to using go1.16 language features only, but this may be
limiting, and may still be problematic when (e.g.) matching signatures
of dependencies.
There is an escape hatch: adding a `//go:build` directive to files that
make use of go language features. From the [go toolchain docs][2]:
> The go line for each module sets the language version the compiler enforces
> when compiling packages in that module. The language version can be changed
> on a per-file basis by using a build constraint.
>
> For example, a module containing code that uses the Go 1.21 language version
> should have a `go.mod` file with a go line such as `go 1.21` or `go 1.21.3`.
> If a specific source file should be compiled only when using a newer Go
> toolchain, adding `//go:build go1.22` to that source file both ensures that
> only Go 1.22 and newer toolchains will compile the file and also changes
> the language version in that file to Go 1.22.
This patch adds `//go:build` directives to those files using recent additions
to the language. It's currently using go1.19 as version to match the version
in our "vendor.mod", but we can consider being more permissive ("any" requires
go1.18 or up), or more "optimistic" (force go1.21, which is the version we
currently use to build).
For completeness sake, note that any file _without_ a `//go:build` directive
will continue to use go1.16 language version when used as a module.
[1]: 58c28ba286/src/cmd/go/internal/gover/version.go (L9-L56)
[2]: https://go.dev/doc/toolchain
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The forwarding database (fdb) of Linux VXLAN links are restricted to
entries with destination VXLAN tunnel endpoint (VTEP) address of a
single address family. Which address family is permitted is set when the
link is created and cannot be modified. The overlay network driver
creates VXLAN links such that the kernel only allows fdb entries to be
created with IPv4 destination VTEP addresses. If the Swarm is configured
with IPv6 advertise addresses, creating fdb entries for remote peers
fails with EAFNOSUPPORT (address family not supported by protocol).
Make overlay networks functional over IPv6 transport by configuring the
VXLAN links for IPv6 VTEPs if the local node's advertise address is an
IPv6 address. Make encrypted overlay networks secure over IPv6 transport
by applying the iptables rules to the ip6tables when appropriate.
Signed-off-by: Cory Snider <csnider@mirantis.com>
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.
This patch moves our own uses of the package to use the new module.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This argument was originally added in libnetwork:
03f440667f
At the time, this argument was conditional, but currently it's always set
to "true", so let's remove it.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
InterfaceOptions() returned an IfaceOptionSetter interface, which contained
"methods" that returned functional options. Such a construct could have made
sense if the functional options returned would (e.g.) be pre-propagated with
information from the Sandbox (network namespace), but none of that was the case.
There was only one implementation of IfaceOptionSetter (networkNamespace),
which happened to be the same as the only implementation of Sandbox, so remove
the interface as well, to help networkNamespace with its multi-personality
disorder.
This patch:
- removes Sandbox.Bridge() and makes it a regular function (WithIsBridge)
- removes Sandbox.Master() and makes it a regular function (WithMaster)
- removes Sandbox.MacAddress() and makes it a regular function (WithMACAddress)
- removes Sandbox.Address() and makes it a regular function (WithIPv4Address)
- removes Sandbox.AddressIPv6() and makes it a regular function (WithIPv6Address)
- removes Sandbox.LinkLocalAddresses() and makes it a regular function (WithLinkLocalAddresses)
- removes Sandbox.Routes() and makes it a regular function (WithRoutes)
- removes Sandbox.InterfaceOptions().
- removes the IfaceOptionSetter interface.
Note that the IfaceOption signature was changes as well to allow returning
an error. This is not currently used, but will be used for some options
in the near future, so adding that in preparation.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
NeighborOptions() returned an NeighborOptionSetter interface, which
contained "methods" that returned functional options. Such a construct
could have made sense if the functional options returned would (e.g.)
be pre-propagated with information from the Sandbox (network namespace),
but none of that was the case.
There was only one implementation of NeighborOptionSetter (networkNamespace),
which happened to be the same as the only implementation of Sandbox, so
remove the interface as well, to help networkNamespace with its multi-personality
disorder.
This patch:
- removes Sandbox.LinkName() and makes it a regular function (WithLinkName)
- removes Sandbox.Family() and makes it a regular function (WithFamily)
- removes Sandbox.NeighborOptions().
- removes the NeighborOptionSetter interface
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
"Pay no attention to the implementation behind the curtain!"
There's only one implementation of the Sandbox interface, and only one implementation
of the Info interface, and they both happens to be implemented by the same type:
networkNamespace. Let's merge these interfaces.
And now that we know that there's one, and only one Info, we can drop the charade,
and relieve the Sandbox from its dual personality.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
InvalidParameter is now compatible with errdefs.InvalidParameter. Thus,
these errors will now return a 400 status code instead of a 500.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The "Capability" type defines DataScope and ConnectivityScope fields,
but their value was set from consts in the datastore package, which
required importing that package and its dependencies for the consts
only.
This patch:
- Moves the consts to a separate "scope" package
- Adds aliases for the consts in the datastore package.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Most drivers do not implement this, so detect if a driver implements
the discoverAPI, and remove the implementation from drivers that do
not support it.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>