57 Commits

Author SHA1 Message Date
Paweł Gronowski
1489cb3ae0 Merge pull request #51722 from vvoland/modernize
Modernize Go code
2025-12-16 12:38:36 +00:00
Rob Murray
1e209e788b Continue to backfill empty PortBindings in API 1.53
- introduced by commit 0ca7ac3 ("daemon: backfill empty PBs
  slices for backward compat")

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-12-16 11:03:58 +00:00
Paweł Gronowski
3df05205f4 modernize: Use range int
Added in Go 1.22

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2025-12-15 18:56:34 +01:00
Paweł Gronowski
a25907b485 modernize: Prefer strings.SplitSeq instead of Split
Avoids extra allocations. Added in Go 1.24.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2025-12-15 18:56:33 +01:00
Austin Vazquez
aa380c26fc Merge pull request #51304 from thaJeztah/faulty_defers
fix some faulty defers in tests
2025-10-28 18:11:19 -05:00
Sebastiaan van Stijn
1f5c82b9fa client: add option and output structs for various container methods
Add option- and output structs for;

- Client.ContainerKill
- Client.ContainerPause
- Client.ContainerRemove
- Client.ContainerResize
- Client.ContainerRestart
- Client.ContainerStart
- Client.ContainerStop
- Client.ContainerUnpause

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-10-27 23:46:28 +01:00
Sebastiaan van Stijn
6040a2f686 fix some faulty defers in tests
Calling "defer assert.NilError(t, someCommand())" executes "someCommand()"
immediately, but doesn't assert the error until the defer.
https://go.dev/play/p/EO--y7OYerg

    package main

    import (
        "errors"
        "testing"

        "gotest.tools/v3/assert"
    )

    func TestDefer(t *testing.T) {
        doSomething := func() error {
            t.Log("doSomething failed with an error!")
            return errors.New("foo error")
        }

        defer assert.NilError(t, doSomething())

        t.Log("running test")
        t.Log("running test")
        t.Log("running test")
    }

Produces:

    === RUN   TestDefer
        prog_test.go:12: doSomething failed with an error!
        prog_test.go:18: running test
        prog_test.go:19: running test
        prog_test.go:20: running test
        prog_test.go:21: assertion failed: error is not nil: foo error
    --- FAIL: TestDefer (0.00s)
    FAIL

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-10-27 12:39:21 +01:00
Austin Vazquez
c646091d57 api: move container port type to network package
Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
2025-10-03 17:30:42 -05:00
Cory Snider
fd4329a620 api/types/container: use netip types as appropriate
Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-10-03 21:39:14 +02:00
Cory Snider
a90adb6dc1 api/types/network: use netip types as appropriate
And generate the ServiceInfo struct from the Swagger spec.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-10-03 21:39:14 +02:00
Austin Vazquez
cb3abacc52 api/types/container: add network port and port range types
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Co-authored-by: Cory Snider <csnider@mirantis.com>
Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
2025-10-02 13:59:34 -05:00
Sebastiaan van Stijn
d3e45f8743 testutil: move back to internal
This package was originally internal, but was moved out when BuildKit
used it for its integration tests. That's no longer the case, so we
can make it internal again.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-09-08 10:08:30 +02:00
Sebastiaan van Stijn
4d20b6fe56 api/types/container: move container options to client
Move the option-types to the client and in some cases create a
copy for the backend. These types are used to construct query-
args, and not marshaled to JSON, and can be replaced with functional
options in the client.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-09-04 20:09:55 +02:00
Austin Vazquez
1b4fcb8da7 api/types/network: move CreateOptions type to client module
Signed-off-by: Austin Vazquez <austin.vazquez@docker.com>
2025-08-27 08:10:20 -05:00
Rob Murray
67ffa47090 nftables: don't enable IP forwarding
For nftables only, never enable IP forwarding on the host. Instead,
return an error on network creation if forwarding is not enabled,
required by a bridge network, and --ip-forward=true.

If IPv4 forwarding is not enabled when the daemon is started with
nftables enabled and other config at defaults, the daemon will
exit when it tries to create the default bridge.

Otherwise, network creation will fail with an error if IPv4/IPv6
forwarding is not enabled when a network is created with IPv4/IPv6.

It's the user's responsibility to configure and secure their host
when they run Docker with nftables.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-08-08 18:43:35 +01:00
Derek McGowan
f74e5d48b3 Create github.com/moby/moby/v2 module
Signed-off-by: Derek McGowan <derek@mcg.dev>
2025-07-31 10:13:29 -07:00
Sebastiaan van Stijn
d58dc493fe replace direct uses of nat types for api/types/container aliases
Follow-up to 494677f93f, which added
the aliases, but did not yet replace our own use of the nat types.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-07-31 02:57:39 +02:00
Derek McGowan
1da417980c Move api/stdcopy to api/pkg/stdcopy
Signed-off-by: Derek McGowan <derek@mcg.dev>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-07-30 14:22:30 +02:00
Derek McGowan
6514282136 Move internal/testutils/networking to integration/internal/testutils/networking
Signed-off-by: Derek McGowan <derek@mcg.dev>
2025-07-24 12:16:06 -07:00
Rob Murray
b0c22a931d Merge pull request #50476 from robmry/bridge-accept-fw-mark
Add option --bridge-accept-fwmark
2025-07-23 10:55:36 +01:00
Rob Murray
cf1695bef1 Add option --bridge-accept-fwmark
Packets with the given firewall mark are accepted by the bridge
driver's filter-FORWARD rules.

The value can either be an integer mark, or it can include a
mask in the format "<mark>/<mask>".

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-07-22 19:15:02 +01:00
Rob Murray
30752f0780 Always allow access to routed endpoints
When an endpoint in a gateway mode "nat" network is selected
as a container's default gateway, the bridge driver sets up
bindings between host and container ports (NAT, userland proxy
etc).

When gateway mode "routed" was added as an alternative to
the default "nat" mode - port bindings followed the same rules.

But, unlike "nat" mode, there's no host port binding to set
up - there's routing between remote client and the container,
so it doesn't matter what the default gateway is.

So, in "routed" mode, set up the rules to make a container's
published ports accessible when the endpoint is added, and
remove those rules when the endpoint is removed (when the
container is disconnected from the endpoint's network).

Port mappings are only provided by ProgramExternalConnectivity,
they can't be set up during the Join. So, include routed
bindings in the port bindings mode that's stored as part of
endpoint state - and use that to work out whether to add or
remove bindings.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-07-22 16:51:59 +01:00
Sebastiaan van Stijn
20d594fb79 deprecate pkg/stdcopy, move to api/stdcopy
The stdcopy package is used to produce and read multiplexed streams for
"attach" and "logs". It is used both by the API server (to produce), and
the client (to read / de-multiplex).

Move it to the api package, so that it can be included in the api module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-07-21 21:41:39 +02:00
Derek McGowan
c47afd41c8 Create github.com/moby/moby/client module
Signed-off-by: Derek McGowan <derek@mcg.dev>
2025-07-21 09:30:26 -07:00
Derek McGowan
afd6487b2e Create github.com/moby/moby/api module
Signed-off-by: Derek McGowan <derek@mcg.dev>
2025-07-21 09:30:05 -07:00
Derek McGowan
7a720df61f Move libnetwork to daemon/libnetwork
Signed-off-by: Derek McGowan <derek@mcg.dev>
2025-07-14 09:25:23 -07:00
Albin Kerouanton
d229c1ba31 libnet/d/bridge: norm pb reqs before forming groups
Port bindings are currently sorted — to form groups that should be
mapped in one go — and then normalized by `configurePortBindingIPv[4|6]`.
However, gw_modes might not be the same for IPv4/v6, so the upcoming
split of NATed / routed portmappers will require that they're processed
independently.

With this commit, PBs are now normalized (by calling the `configure...`
funcs), and then sorted. The sort func is updated to group routed PBs.

`needSamePort` was comparing the container's IP address, but this field
was never set by the time it's called. Now it's set, and has a different
value when IPv4 / IPv6 portmappings are mixed, so remove it from the
comparison.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2025-07-09 00:05:28 +02:00
Rob Murray
4ccbca1efe Add TestRoutedNonGateway
Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-06-18 17:21:57 +01:00
Rob Murray
d6620915db portallocator: always check for ports allocated for 0.0.0.0/::
We set SO_REUSEADDR on sockets used for host port mappings by
docker-proxy - which means it's possible to bind the same port
on a specific address as well as 0.0.0.0/::.

For TCP sockets, an error is raised when listen() is called on
both sockets - and the port allocator will be called again to
avoid the clash (if the port was allocated from a range, otherwise
the container will just fail to start).

But, for UDP sockets, there's no listen() - so take more care
to avoid the clash in the portallocator.

The port allocator keeps a set of allocated ports for each of
the host IP addresses it's seen, including 0.0.0.0/::. So, if a
mapping to 0.0.0.0/:: is requested, find a port that's free in
the range for each of the known IP addresses (but still only
mark it as allocated against 0.0.0.0/::). And, if a port is
requested for specific host addresses, make sure it's also
free in the corresponding 0.0.0.0/:: set (but only mark it as
allocated against the specific addresses - because the same
port can be allocated against a different specific address).

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-05-28 14:00:33 +01:00
Rob Murray
19dc38f79b Listen on mapped host ports before mapping more ports
Because we set SO_REUSEADDR on sockets for host ports, if there
are port mappings for INADDR_ANY (the default) as well as for
specific host ports - bind() cannot be used to detect clashes.

That means, for example, on daemon startup, if the port allocator
returns the first port in its ephemeral range for a specific host
adddress, and the next port mapping is for 0.0.0.0 - the same port
is returned and both bind() calls succeed. Then, the container
fails to start later when listen() spots the problem and it's too
late to find another port.

So, bind and listen to each set of ports as they're allocated
instead of just binding.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-05-28 11:38:59 +01:00
Rob Murray
e48ea1c6e0 Make integration tests ready for nftables
Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-05-27 17:50:03 +01:00
Rob Murray
f9f0db0789 Add nftables support to testutil SetFilterForwardPolicies
Because nftables tables/chain aren't fixed, like they are
in iptables - this change makes an assumption about the
bridge driver's naming.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-05-27 17:50:03 +01:00
Rob Murray
44a3453d73 Add daemon option --allow-direct-routing
Per-network option com.docker.network.bridge.trusted-host-interfaces
accepts a list of interfaces that are allowed to route
directly to a container's published ports in a bridge
network with nat enabled.

This daemon level option disables direct access filtering,
enabling direct access to published ports on container
addresses in all bridge networks, via all host interfaces.

It overlaps with short-term env-var workaround:
  DOCKER_INSECURE_NO_IPTABLES_RAW=1
- it does not allow packets sent from outside the host to reach
  ports published only to 127.0.0.1
- it will outlive iptables (the workaround was initially intended
  for hosts that do not have kernel support for the "raw" iptables
  table).

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-04-30 20:59:28 +01:00
Rob Murray
a94643a1b3 bridge: add option com.docker.network.bridge.trusted_host_interfaces
trusted_host_interface have access to published ports on container
addresses - enabling direct routing to the container via those
interfaces.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-04-30 20:59:28 +01:00
Rob Murray
4fbfb618c3 Skip flaky part of TestAccessPublishedPortFromHost
With firewalld enabled in CI, TestAccessPublishedPortFromHost/userland-proxy=true/IPv6=true
consistently fails when trying to use a link-local address on
eth0 (it's ok for the ULL added by the test).

In a local moby dev container, it passes - although it sometimes
fails when making its request to the host's ::1.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-03-27 08:36:09 +00:00
Rob Murray
86eff82789 Firewalld: Skip tests that run dockerd in an L3Segment
The daemon runs in a separate netns, but when it wants to create
an iptables rule it sends a dbus message to firewalld - which is
running in the host's netns.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-03-27 08:36:09 +00:00
Rob Murray
cf3e42abaf Add an opt-out for iptables 'raw' rules
For kernels that don't have CONFIG_IP_NF_RAW, if the env
var DOCKER_INSECURE_NO_IPTABLES_RAW is set to "1", don't
try to create raw rules.

This means direct routing to published ports is possible
from other hosts on the local network, even if the port
is published to a loopback address.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-03-10 18:32:49 +00:00
Paweł Gronowski
0a58c73e0d integration/net: Retry TestAccessPublishedPortFromAnotherNetwork
Allow each test case to be retried up to 5 times.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2025-03-07 14:57:00 +01:00
Albin Kerouanton
d216084185 libnet/d/bridge: drop remote connections to port mapped on lo
Traditionally when Linux receives remote packets with daddr set to a
loopback address, it reject them as 'martians'. However, when a NAT rule
is applied through iptables this doesn't happen. Our current DNAT rule
used to map host ports to containers is applied unconditionally, even
for such 'martian' packets.

This means a neighbor host (ie. a host connected to the same L2
segment) can send packets to a port mapped on a loopback address. The
purpose of publishing on a loopback address is to make ports
inaccessible to remote hosts -- lack of proper filtering defeats that.

This commit adds an iptables rule to the raw-PREROUTING chain to drop
packets with a loopback dest address and coming from any interface other
than lo.

To accomodate WSL2 mirrored mode, another rule is inserted beforehand to
specifically accept packets coming from the loopback0 interface.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2025-01-27 18:41:20 +01:00
Albin Kerouanton
27adcd596b libnet/d/bridge: port mappings: drop direct-access when gw_mode=nat
When a NAT-based port mapping is created, the daemon adds a DNAT rule in
nat-DOCKER to replace the dest addr with the container IP. However, the
daemon never sets up rules to filter packets destined directly to the
container port. This allows a rogue neighbor (ie. a host that shares a
L2 segment with the host) to send packets directly to the container on
its container-side exposed port.

For instance, if container port 5000 is mapped to host port 6000, a
neighbor could send packets directly to the container on its port 5000.

Since nat-DOCKER mangles the dest addr, and the nat table forbids DROP
rules, this change adds a new rule in the raw-PREROUTING chain to filter
ingress connections targeting the container's IP address.

This filtering is only done when gw_mode=nat. For the unprotected
variant, no filtering is done.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2025-01-27 18:41:20 +01:00
Albin Kerouanton
8474153e13 integration: accessing mappings from another docker network
Commit fc7caf96d reverted 433b1f9b1 as it was introducing a regression,
ie. containers couldn't reach ports published on the host using their
gateway's IP address or the host IP address.

These scenarios are now tested.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2025-01-27 18:41:20 +01:00
Albin Kerouanton
fc7caf96d2 Revert "libnet/d/bridge: port mappings: filter by input iface"
This reverts commit 433b1f9b17.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2025-01-20 14:11:51 +01:00
Albin Kerouanton
433b1f9b17 libnet/d/bridge: port mappings: filter by input iface
When a NAT-based port mapping is created with a HostIP specified, we
insert a DNAT rule in nat-DOCKER to replace the dest addr with the
container IP. Then, in filter chains, we allow access to the container
port for any packet not coming from the container's network itself (if
hairpinning is disabled), nor from another host bridge.

However we don't set any rule that prevents a rogue neighbor that shares
a L2 segment with the host, but not the one where the port binding is
expected to be published, from sending packets destined to that HostIP.

For instance, if a port binding is created with HostIP == '127.0.0.1',
this port should not be accessible from anything but the lo interface.
That's currently not the case and this provides a false sense of
security.

Since nat-DOCKER mangles the dest addr, and the nat table rejects DROP
rules, this change adds rules into raw-PREROUTING to filter ingress
packets destined to mapped ports based on the input interface, the dest
addr and the dest port.

Interfaces are dynamically resolved when packets hit the host, thanks
to iptables' addrtype extension. This extension does a fib lookup of the
dest addr and checks that it's associated with the interface reached.

Also, when a proxy-based port mapping is created, as is the case when an
IPv6 HostIP is specified but the container is only IPv4-capable, we
don't set any sort of filtering. So the same issue might happen. The
reason is a bit different - in that case, that's just how the kernel
works. But, in order to stay consistent with NAT-based mappings, these
rules are also applied.

The env var `DOCKER_DISABLE_INPUT_IFACE_FILTERING` can be set to any
true-ish value to globally disable this behavior.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2025-01-13 19:04:25 +01:00
Rob Murray
3bf9a80818 Rename L3Segment Host.Run -> Host.MustRun
Like netip.MustParseIP, it fails on error.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-12-17 09:14:25 +00:00
Rob Murray
0aba67203a Implement gateway mode "nat-unprotected"
Same as "nat" mode, there's masquerading and port mapping from the
host - but no port/protocol filtering for direct access to the
container's address from remote hosts.

This is the old default behaviour for IPv4 when the filter-FORWARD
chain's default policy was "ACCEPT" (the daemon would only set it
to "DROP" when it set sysctl "ip_forward" itself, but it didn't set
up DROP rules for unpublished ports).

Now, port filtering doesn't depend on the filter-FORWARD policy. So,
this mode is added as a way to restore the old/surprising/insecure
behaviour for anyone who's depending on it. Networks will need to
be re-created with this new gateway mode.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-11-28 19:33:37 +00:00
Akihiro Suda
fb6e650ab9 integration: add wait
Cherry-picked several WIP commits from
b0a592798f/

Originally-authored-by: Rodrigo Campos <rodrigoca@microsoft.com>
Co-Authored-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-11-27 15:52:49 +01:00
Rob Murray
bf251c33d0 Only masquerade access to own published ports for userland-proxy=false
When a container sends a packet to one of its own published ports on the
host, it's normally picked up by the userland proxy and sent back.

When the userland proxy is disabled, a masquerade rule is needed in
order for responses to the container to have the host's source address.

The masquerade rule matches the container's address as source and dest,
and the published port as the dest. It's only used for the no-proxy
case.

So, when the userland proxy is enabled, don't create the masquerade
rule.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-11-12 12:37:25 +00:00
Rob Murray
aba8df74a1 Add TestDirectRoutingOpenPorts
Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-10-22 14:38:13 +01:00
Albin Kerouanton
6649a32cb3 Merge pull request #48676 from robmry/48664_br_netfilter_noproxy
Enable bridge netfiltering if userland-proxy=false
2024-10-17 15:31:45 +02:00
Rob Murray
0548fe251c Enable bridge netfiltering if userland-proxy=false
In release 27.0, ip6tables was enabled by default. That caused a
problem on some hosts where iptables was explicitly disabled and
loading the br_netfilter module (which loads with its nf-call-iptables
settings enabled) caused user-defined iptables rules to block traffic
on bridges, breaking inter-container communication.

In 27.3.0, commit 5c499fc4b2 delayed
loading of the br_netfilter module until it was needed. The load
now happens in the function that sets bridge-nf-call-ip[6]tables when
needed. It was only called for icc=false networks.

However, br_netfilter is also needed when userland-proxy=false.
Without it, packets addressed to a host-mapped port for a container
on the same network are not DNAT'd properly (responses have the server
container's address instead of the host's).

That means, in all releases including 26.x, if br_netfilter was loaded
before the daemon started - and the OS/user/other-application had
disabled bridge-nf-call-ip[6]tables, it would not be enabled by the
daemon. So, ICC would fail for host-mapped ports with the userland-proxy
disabled.

The change in 27.3.0 made this worse - previously, loading br_netfilter
whenever iptables/ip6tables was enabled meant that bridge-netfiltering
got enabled, even though the daemon didn't check it was enabled.

So... check that br_netfilter is loaded, with bridge-nf-call-ip[6]tables
enabled, if userland-proxy=false.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-10-17 12:33:49 +01:00