Commit Graph

1130 Commits

Author SHA1 Message Date
Justin Chadwell
e0b065cc33 daemon: use the registry service config for getting registry hosts
The RegistryHosts lookup function is used by both BuildKit and by the
containerd snapshotter. However, this function differs in behaviour from
the config parser for the RegistryConfig:

- The protocol for insecure registries is treated as significant by
  RegistryHosts, while the RegistryConfig strips this information.
- RegistryConfig validates and deduplicates mirrors.
- RegistryConfig does not parse the insecure-registries as URLs, which
  can lead to parsing opaque URLs as was possible by the RegistryHosts
  function.

This patch updates the lookup function to ensure consistency.

Signed-off-by: Justin Chadwell <me@jedevc.com>
2023-07-17 12:30:34 +01:00
Bjorn Neergaard
8805e38398 Merge pull request #45799 from cpuguy83/containerd_logrus
Switch all logging to use containerd log pkg
2023-06-26 11:51:44 -06:00
Brian Goff
74da6a6363 Switch all logging to use containerd log pkg
This unifies our logging and allows us to propagate logging and trace
contexts together.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-06-24 00:23:44 +00:00
Cory Snider
165dfd6c3e daemon: fix restoring container with missing task
Before 4bafaa00aa, if the daemon was
killed while a container was running and the container shim is killed
before the daemon is restarted, such as if the host system is
hard-rebooted, the daemon would restore the container to the stopped
state and set the exit code to 255. The aforementioned commit introduced
a regression where the container's exit code would instead be set to 0.
Fix the regression so that the exit code is once against set to 255 on
restore.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-23 11:28:45 -04:00
Djordje Lukic
32d58144fd c8d: Use reference counting while mounting a snapshot
Some snapshotters (like overlayfs or zfs) can't mount the same
directories twice. For example if the same directroy is used as an upper
directory in two mounts the kernel will output this warning:

    overlayfs: upperdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior.

And indeed accessing the files from both mounts will result in an "No
such file or directory" error.

This change introduces reference counts for the mounts, if a directory
is already mounted the mount interface will only increment the mount
counter and return the mount target effectively making sure that the
filesystem doesn't end up in an undefined behavior.

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
2023-06-07 15:50:01 +02:00
Cory Snider
0f6eeecac0 daemon: consolidate runtimes config validation
The daemon has made a habit of mutating the DefaultRuntime and Runtimes
values in the Config struct to merge defaults. This would be fine if it
was a part of the regular configuration loading and merging process,
as is done with other config options. The trouble is it does so in
surprising places, such as in functions with 'verify' or 'validate' in
their name. It has been necessary in order to validate that the user has
not defined a custom runtime named "runc" which would shadow the
built-in runtime of the same name. Other daemon code depends on the
runtime named "runc" always being defined in the config, but merging it
with the user config at the same time as the other defaults are merged
would trip the validation. The root of the issue is that the daemon has
used the same config values for both validating the daemon runtime
configuration as supplied by the user and for keeping track of which
runtimes have been set up by the daemon. Now that a completely separate
value is used for the latter purpose, surprising contortions are no
longer required to make the validation work as intended.

Consolidate the validation of the runtimes config and merging of the
built-in runtimes into the daemon.setupRuntimes() function. Set the
result of merging the built-in runtimes config and default default
runtime on the returned runtimes struct, without back-propagating it
onto the config.Config argument.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:25 -04:00
Cory Snider
d222bf097c daemon: reload runtimes w/o breaking containers
The existing runtimes reload logic went to great lengths to replace the
directory containing runtime wrapper scripts as atomically as possible
within the limitations of the Linux filesystem ABI. Trouble is,
atomically swapping the wrapper scripts directory solves the wrong
problem! The runtime configuration is "locked in" when a container is
started, including the path to the runC binary. If a container is
started with a runtime which requires a daemon-managed wrapper script
and then the daemon is reloaded with a config which no longer requires
the wrapper script (i.e. some args -> no args, or the runtime is dropped
from the config), that container would become unmanageable. Any attempts
to stop, exec or otherwise perform lifecycle management operations on
the container are likely to fail due to the wrapper script no longer
existing at its original path.

Atomically swapping the wrapper scripts is also incompatible with the
read-copy-update paradigm for reloading configuration. A handler in the
daemon could retain a reference to the pre-reload configuration for an
indeterminate amount of time after the daemon configuration has been
reloaded and updated. It is possible for the daemon to attempt to start
a container using a deleted wrapper script if a request to run a
container races a reload.

Solve the problem of deleting referenced wrapper scripts by ensuring
that all wrapper scripts are *immutable* for the lifetime of the daemon
process. Any given runtime wrapper script must always exist with the
same contents, no matter how many times the daemon config is reloaded,
or what changes are made to the config. This is accomplished by using
everyone's favourite design pattern: content-addressable storage. Each
wrapper script file name is suffixed with the SHA-256 digest of its
contents to (probabilistically) guarantee immutability without needing
any concurrency control. Stale runtime wrapper scripts are only cleaned
up on the next daemon restart.

Split the derived runtimes configuration from the user-supplied
configuration to have a place to store derived state without mutating
the user-supplied configuration or exposing daemon internals in API
struct types. Hold the derived state and the user-supplied configuration
in a single struct value so that they can be updated as an atomic unit.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:25 -04:00
Cory Snider
0b592467d9 daemon: read-copy-update the daemon config
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:24 -04:00
Cory Snider
038449467e Update BuildKit registry config on daemon reload
Historically, daemon.RegistryHosts() has returned a docker.RegistryHosts
callback function which closes over a point-in-time snapshot of the
daemon configuration. When constructing the BuildKit builder at daemon
startup, the return value of daemon.RegistryHosts() has been used.
Therefore the BuildKit builder would use the registry configuration as
it was at daemon startup for the life of the process, even if the
registry configuration is changed and the configuration reloaded.
Provide BuildKit with a RegistryHosts callback which reflects the
live daemon configuration after reloads so that registry operations
performed by BuildKit always use the same configuration as the rest of
the daemon.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:21 -04:00
Cory Snider
982e4fb448 api/server: get features from a callback fn
Passing around a bare pointer to the map of configured features in order
to propagate to consumers changes to the configuration across reloads is
dangerous. Map operations are not atomic, so concurrently reading from
the map while it is being updated is a data race as there is no
synchronization. Use a getter function to retrieve the current features
map so the features can be retrieved race-free.

Remove the unused features argument from the build router.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:43:27 -04:00
Cory Snider
9b9c5242eb daemon: lock in snapshotter setting at daemon init
Feature flags are one of the configuration items which can be reloaded
without restarting the daemon. Whether the daemon uses the containerd
snapshotter service or the legacy graph drivers is controlled by a
feature flag. However, much of the code which checks the snapshotter
feature flag assumes that the flag cannot change at runtime. Make it so
that the snapshotter setting can only be changed by restarting the
daemon, even if the flag state changes after a live configuration
reload.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-05-24 16:56:17 -04:00
Kevin Alvarez
6d139e5e95 build: use daemon id as worker id for the graph driver controller
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
2023-05-18 21:17:29 +02:00
Sebastiaan van Stijn
fb96b94ed0 daemon: remove handling for deprecated "oom-score-adjust", and produce error
This option was deprecated in 5a922dc162, which
is part of the v24.0.0 release, so we can remove it from master.

This patch;

- adds a check to ValidatePlatformConfig, and produces a fatal error
  if oom-score-adjust is set
- removes the deprecated libcontainerd/supervisor.WithOOMScore
- removes the warning from docker info

With this patch:

    dockerd --oom-score-adjust=-500 --validate
    Flag --oom-score-adjust has been deprecated, and will be removed in the next release.
    unable to configure the Docker daemon with file /etc/docker/daemon.json: merged configuration validation from file and command line flags failed: DEPRECATED: The "oom-score-adjust" config parameter and the dockerd "--oom-score-adjust" options have been removed.

And when using `daemon.json`:

    dockerd --validate
    unable to configure the Docker daemon with file /etc/docker/daemon.json: merged configuration validation from file and command line flags failed: DEPRECATED: The "oom-score-adjust" config parameter and the dockerd "--oom-score-adjust" options have been removed.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-05-06 16:36:17 +02:00
Sebastiaan van Stijn
a5d46a15f5 split GetRepository from ImageService
The GetRepository method interacts directly with the registry, and does
not depend on the snapshotter, but is used for two purposes;

For the GET /distribution/{name:.*}/json route;
dd3b71d17c/api/server/router/distribution/backend.go (L11-L15)

And to satisfy the "executor.ImageBackend" interface as used by Swarm;
58c027ac8b/daemon/cluster/executor/backend.go (L77)

This patch removes the method from the ImageService interface, and instead
implements it through an composite struct that satisfies both interfaces,
and an ImageBackend() method is added to the daemon.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>

remove GetRepository from ImageService

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-04-09 12:07:57 +02:00
Djordje Lukic
15b9176d53 Add the events services to the containerd image service
No events are sent yet, these will come at a later stage.

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
2023-03-30 17:48:51 +02:00
Cory Snider
7b3acdff5d registry: return concrete service type
Move interface definitions to the packages which use the registry
service.

https://github.com/golang/go/wiki/CodeReviewComments#interfaces

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-10 18:38:08 -05:00
Cory Snider
3991faf464 Move filtered registry search out of image service
SearchRegistryForImages does not make sense as part of the image
service interface. The implementation just wraps the search API of the
registry service to filter the results client-side. It has nothing to do
with local image storage, and the implementation of search does not need
to change when changing which backend (graph driver vs. containerd
snapshotter) is used for local image storage.

Filtering of the search results is an implementation detail: the
consumer of the results does not care which actor does the filtering so
long as the results are filtered as requested. Move filtering into the
exported API of the registry service to hide the implementation details.
Only one thing---the registry service implementation---would need to
change in order to support server-side filtering of search results if
Docker Hub or other registry servers were to add support for it to their
APIs.

Use a fake registry server in the search unit tests to avoid having to
mock out the registry API client.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-10 18:36:33 -05:00
Nicolas De Loof
06619763a2 remove GetLayerByID from ImageService interface
Co-authored-by: Nicolas De Loof <nicolas.deloof@gmail.com>
Co-authored-by: Paweł Gronowski <pawel.gronowski@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-03-10 17:54:55 +01:00
Nicolas De Loof
168ca2dcc8 Introduce support for docker commit
Signed-off-by: Laura Brehm <laurabrehm@hey.com>
Co-authored-by: Laura Brehm <laurabrehm@hey.com>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Co-authored-by: Paweł Gronowski <pawel.gronowski@docker.com>
Co-authored-by: Nicolas De Loof <nicolas.deloof@gmail.com>
2023-03-06 15:11:36 +01:00
Sebastiaan van Stijn
11261594d8 Merge pull request #45032 from corhere/shim-opts
daemon: allow shimv2 runtimes to be configured
2023-03-02 21:45:05 +01:00
Cory Snider
a9e7360775 daemon/config: remove AuthzMiddleware field
The authorization.Middleware contains a sync.Mutex field, making it
non-copyable. Remove one of the barriers to allowing deep copies of
config.Config values.

Inject the middleware into Daemon as a constructor argument instead.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-03-01 09:43:39 -05:00
Cory Snider
b0eed5ade6 daemon: allow shimv2 runtimes to be configured
Kubernetes only permits RuntimeClass values which are valid lowercase
RFC 1123 labels, which disallows the period character. This prevents
cri-dockerd from being able to support configuring alternative shimv2
runtimes for a pod as shimv2 runtime names must contain at least one
period character. Add support for configuring named shimv2 runtimes in
daemon.json so that runtime names can be aliased to
Kubernetes-compatible names.

Allow options to be set on shimv2 runtimes in daemon.json.

The names of the new daemon runtime config fields have been selected to
correspond with the equivalent field names in cri-containerd's
configuration so that users can more easily follow documentation from
the runtime vendor written for cri-containerd and apply it to
daemon.json.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-02-17 18:08:06 -05:00
Djordje Lukic
0137446248 Implement run using the containerd snapshotter
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>

c8d/daemon: Mount root and fill BaseFS

This fixes things that were broken due to nil BaseFS like `docker cp`
and running a container with workdir override.

This is more of a temporary hack than a real solution.
The correct fix would be to refactor the code to make BaseFS and LayerRW
an implementation detail of the old image store implementation and use
the temporary mounts for the c8d implementation instead.
That requires more work though.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>

daemon/images: Don't unset BaseFS

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-02-06 18:21:50 +01:00
Cory Snider
f96b9bf761 libnetwork: return concrete-typed *Controller
libnetwork.NetworkController is an interface with a single
implementation.

https://github.com/golang/go/wiki/CodeReviewComments#interfaces

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-01-13 14:09:37 -05:00
Bjorn Neergaard
6d212fa045 Merge pull request #44756 from rumpl/containerd-image-pull
containerd integration: image pull
2023-01-11 16:22:48 -07:00
Paweł Gronowski
9032e6779d c8d/resolver: Fallback to http for insecure registries
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-01-11 17:00:27 +01:00
Nicolas De Loof
c83fce86d4 c8d/resolver: Use hosts from daemon configuration
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
Signed-off-by: Nicolas De Loof <nicolas.deloof@gmail.com>
2023-01-11 17:00:27 +01:00
Sebastiaan van Stijn
6549a270e9 container: ViewDB: return typed system errors
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-08 14:33:57 +01:00
Tianon Gravi
cd8a090e67 Merge pull request #44329 from thaJeztah/remove_trustkey_id_migration
Remove trustkey id migration and config.TrustKeyPath
2022-12-01 12:49:54 -08:00
Paweł Gronowski
dec81e489f daemon/disk_usage: Use context aware singleflight
The singleflight function was capturing the context.Context of the first
caller that invoked the `singleflight.Do`. This could cause all
concurrent calls to be cancelled when the first request is cancelled.

singleflight calls were also moved from the ImageService to Daemon, to
avoid having to implement this logic in both graphdriver and containerd
based image services.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2022-11-29 16:46:19 +01:00
Sebastiaan van Stijn
8feeaecb84 use ad-hoc libtrust key
This is only used for tests, and the key is not verified anymore, so
instead of creating a key and storing it, we can just use an ad-hoc
one.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-28 20:29:19 +01:00
Sebastiaan van Stijn
5cdd6ab7cd daemon/config: remove TrustKeyPath, and local utilities
Turned out that the loadOrCreateTrustKey() utility was doing exactly the
same as libtrust.LoadOrCreateTrustKey(), so making it a thin wrapped. I kept
the tests to verify the behavior, but we could remove them as we only need this
for our integration tests.

The storage location for the generated key was changed (again as we only need
this for some integration tests), so we can remove the TrustKeyPath from the
config.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-28 20:29:19 +01:00
Sebastiaan van Stijn
1981706196 daemon: remove migrateTrustKeyID()
The migration code is in the 22.06 branch, and if we don't migrate
the only side-effect is the daemon's ID being regenerated (as a
UUID).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-11-28 20:28:55 +01:00
Nicolas De Loof
def549c8f6 imageservice: Add context to various methods
Co-authored-by: Paweł Gronowski <pawel.gronowski@docker.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2022-11-03 12:22:40 +01:00
Sebastiaan van Stijn
e7904c5faa Merge pull request #44309 from thaJeztah/daemon_check_requirements
daemon: NewDaemon(): check system requirements early
2022-11-01 13:42:44 +01:00
Sebastiaan van Stijn
ef7e4ec3c6 Merge pull request #44317 from thaJeztah/daemon_mkdir
daemon: NewDaemon(): replace system.MkdirAll for os.Mkdir where possible
2022-11-01 13:41:16 +01:00
Brian Goff
6c5ca9779b Merge pull request #44310 from thaJeztah/daemon_getPluginExecRoot
daemon: getPluginExecRoot(): pass config
2022-10-25 11:52:35 -07:00
Sebastiaan van Stijn
51fe170224 daemon: NewDaemon() fix import colliding with local variable
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-18 16:04:43 +02:00
Sebastiaan van Stijn
27bd49f4bf daemon: NewDaemon(): replace system.MkdirAll for os.Mkdir where possible
`system.MkdirAll()` is a special version of os.Mkdir to handle creating directories
using Windows volume paths (`"\\?\Volume{4c1b02c1-d990-11dc-99ae-806e6f6e6963}"`).
This may be important when `MkdirAll` is used, which traverses all parent paths to
create them if missing (ultimately landing on the "volume" path).

The daemon.NewDaemon() function used `system.MkdirAll()` in various places where
a subdirectory within `daemon.Root` was created. This appeared to be mostly out
of convenience (to not have to handle `os.ErrExist` errors). The `daemon.Root`
directory should already be set up in these locations, and should be set up with
correct permissions. Using `system.MkdirAll()` would potentially mask errors if
the root directory is missing, and instead set up parent directories (possibly
with incorrect permissions).

Because of the above, this patch changes `system.MkdirAll` to `os.Mkdir`. As we
are changing these lines, this patch also changes the legacy octal notation
(`0700`) to the now preferred `0o700`.

One location continues to use `system.MkdirAll`, as the temp-directory may be
configured to be outside of `daemon.Root`, but a redundant `os.Stat(realTmp)`
was removed, as `system.MkdirAll` is expected to handle this.

As we are changing these lines, this patch also changes the legacy octal notation
(`0700`) to the now preferred `0o700`.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-18 16:04:40 +02:00
Sebastiaan van Stijn
19c5d21e6f daemon: getPluginExecRoot(): pass config
This makes it more transparent that it's unused for Linux,
and we don't pass "root", which has no relation with the
path on Linux.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-17 15:22:10 +02:00
Sebastiaan van Stijn
17fb29c9e8 daemon: NewDaemon(): check system requirements early
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-17 15:15:55 +02:00
Sebastiaan van Stijn
7ff0f654fb daemon: add TEST_INTEGRATION_USE_SNAPSHOTTER for CI
This allows us to run CI with the containerd snapshotter enabled, without
patching the daemon.json, or changing how tests set up daemon flags.

A warning log is added during startup, to inform if this variable is set,
as it should only be used for our integration tests.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-17 15:13:53 +02:00
Sebastiaan van Stijn
0a004fd361 daemon: NewDaemon(): log message if containerd snapshotter is enabled
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-10-17 15:00:10 +02:00
Brian Goff
4c0e0979b4 Fix live-restore w/ restart policies + volume refs
Before this change restarting the daemon in live-restore with running
containers + a restart policy meant that volume refs were not restored.
This specifically happens when the container is still running *and*
there is a restart policy that would make sure the container was running
again on restart.

The bug allows volumes to be removed even though containers are
referencing them. 😱

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2022-09-30 22:30:58 +00:00
Sebastiaan van Stijn
173d16b233 Merge pull request #44193 from thaJeztah/libnetwork_cleanup
libnetwork: cleanup config package, remove old integration tests
2022-09-27 22:41:32 +02:00
Sebastiaan van Stijn
a8a8bd1e42 libnetwork/config: remove "Experimental" and "Debug" options
These were no longer used.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-09-26 12:05:22 +02:00
Cory Snider
95824f2b5f pkg/containerfs: simplify ContainerFS type
Iterate towards dropping the type entirely.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-09-23 16:56:49 -04:00
Sebastiaan van Stijn
511a909ae6 container: remove ViewDB and View interfaces, use concrete types
These interfaces were added in aacddda89d, with
no clear motivation, other than "Also hide ViewDB behind an interface".

This patch removes the interface in favor of using a concrete implementation;
There's currently only one implementation of this interface, and if we would
decide to change to an alternative implementation, we could define relevant
interfaces on the receiver side.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-09-21 17:38:45 +02:00
Sebastiaan van Stijn
670ce6785d Merge pull request #44091 from rumpl/fix-local-context
Wrap local calls to the content and lease service
2022-09-06 18:49:43 +02:00
Djordje Lukic
878906630b Wrap local calls to the content and lease service
The wrapper sets the default namespace in the context if none is
provided, this is needed because we are calling these services directly
and not trough GRPC that has an interceptor to set the default namespace
to all calls.

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
2022-09-06 17:33:19 +02:00