113 Commits

Author SHA1 Message Date
Paweł Gronowski
6a1fb46d48 Merge pull request #50169 from robmry/revert_overlay_refactoring
[28.x]: Revert overlay bug fixes / refactoring
2025-06-13 15:49:07 +00:00
Rob Murray
ea818a7f6f Revert "libnetwork/internal/setmatrix: make keys generic"
This reverts commit 0317f773a6.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-06-11 12:05:33 +01:00
Paweł Gronowski
2e25775c83 libnetwork: Replace deprecated usages
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2025-06-09 19:30:00 +02:00
Cory Snider
0317f773a6 libnetwork/internal/setmatrix: make keys generic
Make the SetMatrix key's type generic so that e.g. netip.Addr values can
be used as matrix keys.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2025-05-27 13:29:41 -04:00
Rob Murray
350bb5197a nftables: attempt a table-reload after an Apply error
Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-05-14 10:38:11 +01:00
Rob Murray
06afbe9618 Check nftables is enabled before applying updates
Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-05-14 10:38:11 +01:00
Rob Murray
976f855f68 Add OTEL span for nftables updates
Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-05-14 10:38:11 +01:00
Sebastiaan van Stijn
7c52c4d92e update go:build tags to go1.23 to align with vendor.mod
Go maintainers started to unconditionally update the minimum go version
for golang.org/x/ dependencies to go1.23, which means that we'll no longer
be able to support any version below that when updating those dependencies;

> all: upgrade go directive to at least 1.23.0 [generated]
>
> By now Go 1.24.0 has been released, and Go 1.22 is no longer supported
> per the Go Release Policy (https://go.dev/doc/devel/release#policy).
>
> For golang/go#69095.

This updates our minimum version to go1.23, as we won't be able to maintain
compatibility with older versions because of the above.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-04-17 15:43:19 +02:00
Rob Murray
619f1ddd05 Warn when no external DNS nameservers are found
Since commit 925b484 ("No fallback nameservers for internal
resolver"), if the host's resolv.conf has no nameservers and
no servers are supplied via config, the internal resolver will
not use Google's DNS - so the container will not be able to
resolve external DNS requests.

That can happen when container's are "restart-always" and the
docker daemon starts before the host's DNS is configured.

So, to highlight the issue (which may not be an error, but
probably is), include a warning in the container's resolv.conf
file.

Also, log a warning - logs currently say "No non-localhost DNS
nameservers are left in resolv.conf. Using default external
servers". But, that's misleading because it's from an initial
resolv.conf setup, before the internal resolver configured without
those fallbacks - we'll drop the fallbacks completely once the
default bridge has an internal resolver).

Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-04-17 10:51:06 +01:00
Rob Murray
9ba5c5d70e Merge pull request #49732 from robmry/nftables_primitives
Add utils for manipulating nftables rules
2025-04-08 09:25:41 +01:00
Sebastiaan van Stijn
6422ff2804 deprecate pkg/atomicwriter, migrate to github.com/moby/sys/atomicwriter
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-04-04 23:07:00 +02:00
Rob Murray
7d742ebf75 Add utils for manipulating nftables rules
Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-04-03 16:47:30 +01:00
Sebastiaan van Stijn
17f32e8822 libnetwork/internal/resolvconf: avoid allocations with bytes.NewBufferString (mirror)
libnetwork/internal/resolvconf/resolvconf_test.go:63:21: avoid allocations with bytes.NewBufferString (mirror)
                rc, err := Parse(bytes.NewBuffer([]byte("options "+tc.options)), "")
                                 ^
    libnetwork/internal/resolvconf/resolvconf_test.go:106:19: avoid allocations with bytes.NewBufferString (mirror)
        rc, err := Parse(bytes.NewBuffer([]byte("nameserver 1.2.3.4")), "")
                         ^
    libnetwork/internal/resolvconf/resolvconf_test.go:214:21: avoid allocations with bytes.NewBufferString (mirror)
                rc, err := Parse(bytes.NewBuffer([]byte(input)), "")
                                 ^
    libnetwork/internal/resolvconf/resolvconf_test.go:311:21: avoid allocations with bytes.NewBufferString (mirror)
                rc, err := Parse(bytes.NewBuffer([]byte(tc.input)), "/etc/resolv.conf")
                                 ^
    libnetwork/internal/resolvconf/resolvconf_test.go:418:21: avoid allocations with bytes.NewBufferString (mirror)
                rc, err := Parse(bytes.NewBuffer([]byte(tc.input)), "/etc/resolv.conf")
                                 ^
    libnetwork/internal/resolvconf/resolvconf_test.go:492:21: avoid allocations with bytes.NewBufferString (mirror)
                rc, err := Parse(bytes.NewBuffer([]byte(content)), "/etc/resolv.conf")
                                 ^
    libnetwork/internal/resolvconf/resolvconf_test.go:535:19: avoid allocations with bytes.NewBufferString (mirror)
        rc, err := Parse(bytes.NewBuffer([]byte("nameserver 1.2.3.4.5")), "")
                         ^
    libnetwork/internal/resolvconf/resolvconf_test.go:548:19: avoid allocations with bytes.NewBufferString (mirror)
        rc, err := Parse(bytes.NewBuffer([]byte("nameserver 127.0.0.53")), "/etc/resolv.conf")
                         ^
    libnetwork/internal/resolvconf/resolvconf_test.go:569:19: avoid allocations with bytes.NewBufferString (mirror)
        rc, err := Parse(bytes.NewBuffer([]byte(input)), "/etc/resolv.conf")
                         ^

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-02-09 13:22:46 +01:00
Sebastiaan van Stijn
f9890d97d1 libnet: kvstore/boltdb: fix append to non-zero initialized length (makezero)
Changing to use binary.LittleEndian.AppendUint64, which does not require
the slice to have an initial size, and makes the code slightly more
straightforward.

    libnetwork/internal/kvstore/boltdb/boltdb.go:79:11: append to slice `dbval` with non-zero initialized length (makezero)
            dbval = append(dbval, value...)
                    ^
    libnetwork/internal/kvstore/boltdb/boltdb.go:228:11: append to slice `dbval` with non-zero initialized length (makezero)
            dbval = append(dbval, value...)
                    ^

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-02-09 13:22:45 +01:00
Paweł Gronowski
9e77d05967 add //go:build directives to prevent downgrading to go1.16 language
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2025-02-06 18:16:59 +01:00
Rob Murray
eaa84bc8f4 Send unsolicited ARP/NA requests when bringing up interfaces
Co-authored-by: Cory Snider <csnider@mirantis.com>
Co-authored-by: Rob Murray <rob.murray@docker.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-01-22 16:59:27 +00:00
Rob Murray
7e247e8b13 Add addrset.AddrSet to track a set of IP addresses
Signed-off-by: Rob Murray <rob.murray@docker.com>
2025-01-20 16:48:46 +00:00
Sebastiaan van Stijn
7864454792 pkg/ioutils: move atomic file-writers to a separate (pkg/atomicwriter) package
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-01-08 10:36:32 +01:00
Sebastiaan van Stijn
b453aa65fa update go:build tags to use go1.22
commit a0807e7cfe configured golangci-lint
to use go1.23 semantics, which alowed linters like `copyloopvar` to lint
using thee correct semantics.

go1.22 now creates a copy of variables when assigned in a loop; make sure we
don't have files that may downgrade semantics to go1.21 in case that also means
disabling that feature; https://go.dev/ref/spec#Go_1.22

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-11-12 14:02:09 +01:00
Albin Kerouanton
edcefd4efb libnet/i/kv/boltdb: fail fast in case of contention
Make sure an error is returned straight away if there's contention on
the underlying db file. This makes sure we don't reintroduce the issue
fixed in d21d088, and it will help detect contention in parallelized
tests if they're badly written. It effectively adds a new error mode to
the daemon, but if anyone faces this error, they should fix their
process manager.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-09-20 08:48:16 +02:00
Albin Kerouanton
ed08486ec7 libnet/ds: simplify datastore.New()
That function was needlessly complex. Instead of relying on a struct and
a sub-struct, it now just takes two string params: a path and a bucket
name.

Libnetwork config is now initialized with default values. A new struct
is introduced in libnetwork/config to let tests customize the path and
bucket name.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-09-20 08:48:16 +02:00
Albin Kerouanton
32b9e7b8b9 libnet/i/kv/boltdb: remove unused field 'timeout'
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-09-19 09:52:10 +02:00
Nathan Baulch
59eba0ae13 Fix typos
Signed-off-by: Nathan Baulch <nathan.baulch@gmail.com>
2024-09-06 21:53:09 +10:00
Sebastiaan van Stijn
fe307b5dab libnetwork: resolvconf: remove dependency on errdefs
the resolvconf package is imported in BuildKit, and this is the only
location that used the errdefs package outside of the client.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-08-23 17:54:21 +02:00
Sebastiaan van Stijn
afdfc04e10 libnetwork: resolvconf: remove var that shadowed import
It was only used in a single place, so we can remove the
intermediate variable.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-08-23 17:54:16 +02:00
Rob Murray
925b484a40 No fallback nameservers for internal resolver
The internal resolver now uses any namesever found in the host's
/etc/resolv.conf as an external nameserver, and it's accessed
from the host's network namespace.

Before this change, when no external nameservers were found (so
the host had no entries in /etc/resolv.conf) Google's DNS servers
were used as fallbacks, always accessed from the container's
network namespace. If a container's initial set of endpoints had
IPv6 enabled, the IPv6 nameservers were included.

Now we have IPv6-only networks, a similar exception would be
needed for Google's IPv4 nameservers... don't include them if
there are no IPv4 endpoints.

However, only the initial set of endpoints was considered. As
networks are connected/disconnected, IPv4 or IPv6 connectivity
may be lost.

Unlike nameservers read from the host's /etc/resolv.conf, there
is no way to tell which fallback nameservers (v4/v6) might work
from the host's namespace. So, using the host's namespace isn't
a good solution.

Since we want to get away from using fallback nameservers anyway,
this change removes them.

If a host has no /etc/resolv.conf entries, but a container does
need to use DNS, it'll need to be configured with servers via
'--dns'.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-08-06 10:33:05 +01:00
Rob Murray
d29767431c Use host netns for host's ext-dns servers
The internal resolver needs to know whether to make requets
to external DNS servers from the container's network namespace
or the host's.

The original rule was that requests were always made from the
container's namespace, unless the nameserver was on a localhost
address on the host. IPv6 nameservers were left in the container's
/etc/resolv.conf.

Commit 4e8d9a4 modified that so that IPv6 nameservers were also
used as external nameservers. The internal resolver accessed
them from the host namespace if the container's initial set of
endpoints were IPv4-only, or the nameserver address contained
a zone-id, (or the nameserver was on the IPv6 loopback address).

That would break if initial IPv6 endpoints were disconnected from
the container, leaving it with no IPv6 address.

Once IPv6-only networks are allowed, another exception would need
to be made for IPv4 nameservers (they'd need to be accessed from
the host's namespace).

Instead of doing that ... this change simplifies things, if a
nameserver address is read from the host's /etc/resolv.conf, it'll
work in the host's namespace. So, the rule is now simply that
nameservers read from the host's resolv.conf are accessed from the
host's namespace. DNS servers added as overrides ('--dns') are
accessed from the container's namespace (as before).

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-08-06 10:33:04 +01:00
Cory Snider
1c102140f8 libnetwork: switch to Go 1.19 atomics
Signed-off-by: Cory Snider <csnider@mirantis.com>
2024-07-08 11:09:56 -04:00
Sebastiaan van Stijn
84e43da752 libnetwork: gofumpt
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-06-27 23:18:48 +02:00
Rob Murray
6d35673504 Revert "No default nameservers for internal resolver"
This reverts commit d365702dbd.

Because buildkit doesn't run an internal resolver, and it bases its
/etc/resolv.conf on the host's ... when buildkit is run in a container
that has 'nameserver 127.0.0.11', its build containers will use Google's
DNS servers as a fallback (unless the build container uses host
networking).

Before, when the 127.0.0.11 resolver was not used for the default network,
the buildkit container would have inherited a site-local nameserver. So,
the build containers it created would also have inherited that DNS
server - and they'd be able to resolve site-local hostnames.

By replacing the site-local nameserver with Google's, we broke access
to local DNS and its hostnames.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-06-17 20:19:10 +01:00
Sebastiaan van Stijn
b7d5a42168 Update go:build comments to go1.21
Match the minimum version that's specified on our vendor.mod.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-06-13 14:59:54 +02:00
Rob Murray
d365702dbd No default nameservers for internal resolver
Don't fall-back to Google's DNS servers in a network that has an
internal resolver.

Now the default bridge uses the internal resolver, the only reason a
network started by the daemon should end up without any upstream
servers is if the host's resolv.conf doesn't list any.  In this case,
the '--dns' option can be used to explicitly configure nameservers
for a container if necessary.

(Note that buildkit's containers do not have an internal resolver, so
they will still set up Google's nameservers if the host has no
resolvers that can be used in the container's namespace.)

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-06-05 20:27:24 +01:00
Albin Kerouanton
9d288b5b43 libnet/i/defaultipam: introduce a linear allocator
The previous allocator was subnetting address pools eagerly
when the daemon started, and would then just iterate over that
list whenever RequestPool was called. This was leading to high
memory usage whenever IPv6 pools were configured with a target
subnet size too different from the pools prefix size.

For instance: pool = fd00::/8, target size = /64 -- 2 ^ (64-8)
subnets would be generated upfront. This would take approx.
9 * 10^18 bits -- way too much for any human computer in 2024.

Another noteworthy issue, the previous implementation was allocating
a subnet, and then in another layer was checking whether the
allocation was conflicting with some 'reserved networks'. If so,
the allocation would be retried, etc... To make it worse, 'reserved
networks' would be recomputed on every iteration. This is totally
ineffective as there could be 'reserved networks' that fully overlap
a given address pool (or many!).

To fix this issue, a new field `Exclude` is added to `RequestPool`.
It's up to each driver to take it into account. Since we don't know
whether this retry loop is useful for some remote IPAM driver, it's
reimplemented bug-for-bug directly in the remote driver.

The new allocator uses a linear-search algorithm. It takes advantage
of all lists (predefined pools, allocated subnets and reserved
networks) being sorted and logically combines 'allocated' and
'reserved' through a 'double cursor' to iterate on both lists at the
same time while preserving the total order. At the same time, it
iterates over 'predefined' pools and looks for the first empty space
that would be a good fit.

Currently, the size of the allocated subnet is still dictated by
each 'predefined' pools. We should consider hardcoding that size
instead, and let users specify what subnet size they want. This
wasn't possible before as the subnets were generated upfront. This
new allocator should be able to deal with this easily.

The method used for static allocation has been updated to make sure
the ascending order of 'allocated' is preserved. It's bug-for-bug
compatible with the previous implementation.

One consequence of this new algorithm is that we don't keep track
of where the last allocation happened, we just allocate the first
free subnet we find.

Before:

- Allocate: 10.0.1.0/24, 10.0.2.0/24 ; Deallocate: 10.0.1.0/24 ;
Allocate 10.0.3.0/24.

Now, the 3rd allocation would yield 10.0.1.0/24 once again.

As it doesn't change the semantics of the allocator, there's no
reason to worry about that.

Finally, about 'reserved networks'. The heuristics we use are
now properly documented. It was discovered that we don't check
routes for IPv6 allocations -- this can't be changed because
there's no such thing as on-link routes for IPv6.

(Kudos to Rob Murray for coming up with the linear-search idea.)

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-05-23 08:24:51 +02:00
Rob Murray
f07644e17e Add netiputil.AddrPortFromNet()
Co-authored-by: Cory Snider <csnider@mirantis.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-04-15 14:51:20 +01:00
Sebastiaan van Stijn
4ff655f4b8 resolvconf: add //go:build directives to prevent downgrading to go1.16 language
Commit 8921897e3b introduced the uses of `clear()`,
which requires go1.21, but Go is downgrading this file to go1.16 when used in
other projects (due to us not yet being a go module);

    0.175 + xx-go build '-gcflags=' -ldflags '-X github.com/moby/buildkit/version.Version=b53a13e -X github.com/moby/buildkit/version.Revision=b53a13e4f5c8d7e82716615e0f23656893df89af -X github.com/moby/buildkit/version.Package=github.com/moby/buildkit -extldflags '"'"'-static'"'" -tags 'osusergo netgo static_build seccomp ' -o /usr/bin/buildkitd ./cmd/buildkitd
    181.8 # github.com/docker/docker/libnetwork/internal/resolvconf
    181.8 vendor/github.com/docker/docker/libnetwork/internal/resolvconf/resolvconf.go:509:2: clear requires go1.21 or later (-lang was set to go1.16; check go.mod)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-03-18 12:28:21 +01:00
Sebastiaan van Stijn
1abf17c779 Merge pull request #47512 from robmry/46329_internal_resolver_ipv6_upstream
Add IPv6 nameserver to the internal DNS's upstreams.
2024-03-07 21:21:12 +01:00
Paweł Gronowski
608d77d740 Merge pull request #47497 from robmry/resolvconf_fixes
Fix 'resolv.conf' parsing issues
2024-03-07 13:05:10 +01:00
Sebastiaan van Stijn
4adc40ac40 fix duplicate words (dupwords)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-03-07 10:57:03 +01:00
Rob Murray
8921897e3b Ignore bad ndots in host resolv.conf
Rather than error out if the host's resolv.conf has a bad ndots option,
just ignore it. Still validate ndots supplied via '--dns-option' and
treat failure as an error.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-03-07 09:27:34 +00:00
Rob Murray
4e8d9a4522 Add IPv6 nameserver to the internal DNS's upstreams.
When configuring the internal DNS resolver - rather than keep IPv6
nameservers read from the host's resolv.conf in the container's
resolv.conf, treat them like IPv4 addresses and use them as upstream
resolvers.

For IPv6 nameservers, if there's a zone identifier in the address or
the container itself doesn't have IPv6 support, mark the upstream
addresses for use in the host's network namespace.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-03-06 10:47:18 +00:00
Rob Murray
f04f69e366 Accumulate resolv.conf options
If there are multiple "options" lines, keep the options from all of
them.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-03-01 16:59:28 +00:00
Rob Murray
7f69142aa0 resolv.conf comments have '#' or ';' in the first column
When a '#' or ';' appears anywhere else, it's not a comment marker.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-03-01 16:58:04 +00:00
Rob Murray
91d9307738 Replace uses of slices.Clone()
Avoid https://github.com/golang/go/issues/64759

Co-authored-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-03-01 15:27:29 +00:00
Sebastiaan van Stijn
6c3b3523c9 Merge pull request #47041 from robmry/46968_refactor_resolvconf
Refactor 'resolv.conf' generation.
2024-02-29 09:33:55 +01:00
Albin Kerouanton
cbd45e83cf libnet: Replace DeleteAtomic in retry loops with DeleteIdempotent
A common pattern in libnetwork is to delete an object using
`DeleteAtomic`, ie. to check the optimistic lock, but put in a retry
loop to refresh the data and the version index used by the optimistic
lock.

This commit introduces a new `Delete` method to delete without
checking the optimistic lock. It focuses only on the few places where
it's obvious the calling code doesn't rely on the side-effects of the
retry loop (ie. refreshing the object to be deleted).

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-02-22 08:22:09 +01:00
Rob Murray
beb97f7fdf Refactor 'resolv.conf' generation.
Replace regex matching/replacement and re-reading of generated files
with a simple parser, and struct to remember and manipulate the file
content.

Annotate the generated file with a header comment saying the file is
generated, but can be modified, and a trailing comment describing how
the file was generated and listing external nameservers.

Always start with the host's resolv.conf file, whether generating config
for host networking, or with/without an internal resolver - rather than
editing a file previously generated for a different use-case.

Resolves an issue where rewrites of the generated file resulted in
default IPv6 nameservers being unnecessarily added to the config.

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-02-06 22:26:12 +00:00
Albin Kerouanton
83af50aee3 libnet: boltdb: inline getDBhandle()
Previous commit made getDBhandle a one-liner returning a struct
member -- making it useless. Inline it.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-02-02 09:19:07 +01:00
Albin Kerouanton
4d7c11c208 libnet: boltdb: remove PersistConnection
This parameter was used to tell the boltdb kvstore not to open/close
the underlying boltdb db file before/after each get/put operation.

Since d21d0884ae, we've a single datastore instance shared by all
components that need it. That commit set `PersistConnection=true`.
We can now safely remove this param altogether, and remove all the
code that was opening and closing the db file before and after each
operation -- it's dead code!

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-02-02 09:19:07 +01:00
Cory Snider
2200c0137f libnetwork/datastore: don't parse file path
File paths can contain commas, particularly paths returned from
t.TempDir() in subtests which include commas in their names. There is
only one datastore provider and it only supports a single address, so
the only use of parsing the address is to break tests in mysterious
ways.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2024-01-31 21:26:28 -05:00
Sebastiaan van Stijn
388216fc45 Merge pull request #46850 from robmry/46829-allow_ipv6_subnet_change
Allow overlapping change in bridge's IPv6 network.
2023-12-19 18:35:13 +01:00