The call from Daemon.create -> Daemon.setHostConfig acquired
container.Lock, but didn't need to because the container is
newly created and solely owned by the caller. The call from
Daemon.restore did not acquire the lock.
Signed-off-by: Rob Murray <rob.murray@docker.com>
Starting with kernel v6.12, kernel memory TCP accounting is deprecated for cgroups v1.
Note: kernel memory TCP accounting is not supported by cgroups v2.
See d046ff46ee
Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
kernel-memory limits are not supported in cgroups v2, and were obsoleted in
[kernel v5.4], producing a `ENOTSUP` in kernel v5.16. Support for this option
was removed in runc and other runtimes, as various LTS kernels contained a
broken implementation, resulting in unpredictable behavior.
We deprecated this option in [moby@b8ca7de], producing a warning when used,
and actively ignore the option since [moby@0798f5f].
Given that setting this option had no effect in most situations, we should
just remove this option instead of continuing to handle it with the expectation
that a runtime may still support it.
Note that we still support RHEL 8 (kernel 4.18) and RHEL 9 (kernel 5.14). We
no longer build packages for Ubuntu 20.04 (kernel 5.4) and Debian Bullseye 11
(kernel 5.10), which still have an LTS / ESM programme, but for those it would
only impact situations where a runtime is used that still supports it, and
an old API version was used.
[kernel v5.4]: https://github.com/torvalds/linux/commit/0158115f702b0ba208ab0
[moby@b8ca7de]: b8ca7de823
[moby@0798f5f]: 0798f5f5cf
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Libnetwork passes a map[string]any to the bridge driver's Register
function. This forces the daemon to convert its configuration into a
map, and the driver to convert that map back into a struct.
This is unnecessary complexity, and makes it harder to track down where
and how bridge driver configuration fields are set.
Refactor libnetwork to let the daemon register the bridge.Configuration
directly through a new option `OptionBridgeConfig`.
The bridge driver now takes a `Configuration` param that needs no
special treatment.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Since commit 4e246efcd, individual portmappers are responsible for
setting up firewall rules and proxying according to their needs.
This change moves the responsibility back to the bridge driver, removing
unnecessary code duplication across portmappers. For now, only the nat
portmapper takes advantage of this.
This partially reverts commit 4e246efcd.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
These errors were not used as sentinel error, and used as any other
"invalid parameter" / "invalid argument" error, so remove them, and
just produce errors where used.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Packets with the given firewall mark are accepted by the bridge
driver's filter-FORWARD rules.
The value can either be an integer mark, or it can include a
mask in the format "<mark>/<mask>".
Signed-off-by: Rob Murray <rob.murray@docker.com>
The bridge driver currently determines if hairpin mode is enabled by
checking whether the userland proxy is enabled, and if the binary path
is set to a non-empty string. It's used (amongst other things) by the
driver to decide whether 6-to-4 portmappings are supported, while it
normalizes port bindings.
As the userland proxy is going to be handled by the nat portmapper,
proxy-related params will be removed from the bridge driver, but the
port binding normalization will stay in the bridge driver.
So, introduce a new Hairpin config flag, and reimplement the original
logic in the daemon, when creating the bridge config.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
These comments were added to enforce using the correct import path for
our packages ("github.com/docker/docker", not "github.com/moby/moby").
However, when working in go module mode (not GOPATH / vendor), they have
no effect, so their impact is limited.
Remove these imports in preparation of migrating our code to become an
actual go module.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Per-network option com.docker.network.bridge.trusted-host-interfaces
accepts a list of interfaces that are allowed to route
directly to a container's published ports in a bridge
network with nat enabled.
This daemon level option disables direct access filtering,
enabling direct access to published ports on container
addresses in all bridge networks, via all host interfaces.
It overlaps with short-term env-var workaround:
DOCKER_INSECURE_NO_IPTABLES_RAW=1
- it does not allow packets sent from outside the host to reach
ports published only to 127.0.0.1
- it will outlive iptables (the workaround was initially intended
for hosts that do not have kernel support for the "raw" iptables
table).
Signed-off-by: Rob Murray <rob.murray@docker.com>
Go maintainers started to unconditionally update the minimum go version
for golang.org/x/ dependencies to go1.23, which means that we'll no longer
be able to support any version below that when updating those dependencies;
> all: upgrade go directive to at least 1.23.0 [generated]
>
> By now Go 1.24.0 has been released, and Go 1.22 is no longer supported
> per the Go Release Policy (https://go.dev/doc/devel/release#policy).
>
> For golang/go#69095.
This updates our minimum version to go1.23, as we won't be able to maintain
compatibility with older versions because of the above.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Add an OTel span processor copying the 'trigger' baggage member
propagated through contexts to all children spans. It's used to identify
what triggered a trace / span (API call, libnet init, etc...)
All code paths that call libnet's `NewNetwork` set this baggage member
with a unique value.
For instance, this can be used to distinguish bridge's `createNetwork`
spans triggered by daemon / libnet initialization from custom network
creation triggerd by an API call.
Two util functions are added to wrap `baggage.New` and
`baggage.NewMemberRaw` to make it easier to deal with baggage and
members by panicking on error. These should not be used with dynamic
values.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Plumb context from the API down to libnet driver method `CreateNetwork`,
and add an OTel span to the bridge driver's `createNetwork` method.
Include a few attributes describing the network configuration (e.g.
IPv4/IPv6, ICC, internal and MTU).
A new util function, `RecordStatus`, is added to the `otelutil` package
to easily record any error, and update the span status accordingly.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
Now a gratuitous/unsolicted ARP is sent, there's no need to
use an IPv4-based MAC address to preserve arp-cache mappings
between an endpoint's IP addresses and its MAC addresses.
Because a random MAC address is used for the default bridge,
it no longer makes sense to derive container IPv6 addresses
from the MAC address. This "postIPv6" behaviour was needed
before IPv6 addresses could be configured, but not now. So,
IPv6 addresses will now be IPAM-allocated on the default
bridge network, just as they are for user-defined bridges.
Signed-off-by: Rob Murray <rob.murray@docker.com>
This requires changes in the CLI to support fully, but matches our other boolean option handling (`no-new-privileges`).
Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
This makes `WritableCgroups` a pointer so we can error when it's specified in invalid configurations (both rootless and user namespaces).
Signed-off-by: Tianon Gravi <admwiggin@gmail.com>
Fixes#42040Closes#42043
Rather than making cgroups read-write by default, instead have a flag
for making it possible.
Since these security options are passed through the cli to daemon API,
no changes are needed to docker-cli.
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>