Commit Graph

189 Commits

Author SHA1 Message Date
Sebastiaan van Stijn
d4d93bf558 daemon/container: remove State.ExitCode() method
This method did not provide any special handling for accessing the
field, and did not handle locking. Let's remove it for now to
not pretend we're doing anything more safe than directly accessing
the field.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-09-19 15:33:36 +01:00
Sebastiaan van Stijn
0df791cb72 explicitly access Container.State instead of through embedded struct
The Container.State struct holds the container's state, and most of
its fields are expected to change dynamically. Some o these state-changes
are explicit, for example, setting the container to be "stopped". Other
state changes can be more explicit, for example due to the containers'
process exiting or being "OOM" killed by the kernel.

The distinction between explicit ("desired") state changes and "state"
("actual state") is sometimes vague; for some properties, we clearly
separated them, for example if a user requested the container to be
stopped or restarted, we store state in the Container object itself;

    HasBeenManuallyStopped   bool // used for unless-stopped restart policy
    HasBeenManuallyRestarted bool `json:"-"` // used to distinguish restart caused by restart policy from the manual one

Other properties are more ambiguous. such as "HasBeenStartedBefore" and
"RestartCount", which are stored on the Container (and persisted to
disk), but may be more related to "actual" state, and likely should
not be persisted;

    RestartCount             int
    HasBeenStartedBefore     bool

Given that (per the above) concurrency must be taken into account, most
changes to the `container.State` struct should be protected; here's where
things get blurry. While the `State` type provides various accessor methods,
only some of them take concurrency into account; for example, [State.IsRunning]
and [State.GetPID] acquire a lock, whereas [State.ExitCodeValue] does not.
Even the (commonly used) [State.StateString] has no locking at all.

The way to handle this is error-prone; [container.State] contains a mutex,
and it's exported. Given that its embedded in the [container.Container]
struct, it's also exposed as an exported mutex for the container. The
assumption here is that by "merging" the two, the caller to acquire a lock
when either the container _or_ its state must be mutated. However, because
some methods on `container.State` handle their own locking, consumers must
be deeply familiar with the internals; if both changes to the `Container`
AND `Container.State` must be made. This gets amplified more as some
(exported!) methods, such as [container.SetRunning] mutate multiple fields,
but don't acquire a lock (so expect the caller to hold one), but their
(also exported) counterpart (e.g. [State.IsRunning]) do.

It should be clear from the above, that this needs some architectural
changes; a clearer separation between "desired" and "actual" state (opening
the potential to update the container's config without manually touching
its `State`), possibly a method to obtain a read-only copy of the current
state (for those querying state), and reviewing which fields belong where
(and should be persisted to disk, or only remain in memory).

This PR preserves the status quo; it makes no structural changes, other
than exposing where we access the container's state. Where previously the
State fields and methods were referred to as "part of the container"
(e.g. `ctr.IsRunning()` or `ctr.Running`), we now explicitly reference
the embedded `State` (`ctr.State.IsRunning`, `ctr.State.Running`).

The exception (for now) is the mutex, which is still referenced through
the embedded struct (`ctr.Lock()` instead of `ctr.State.Lock()`), as this
is (mostly) by design to protect the container, and what's in it (including
its `State`).

[State.IsRunning]: c4afa77157/daemon/container/state.go (L205-L209)
[State.GetPID]: c4afa77157/daemon/container/state.go (L211-L216)
[State.ExitCodeValue]: c4afa77157/daemon/container/state.go (L218-L228)
[State.StateString]: c4afa77157/daemon/container/state.go (L102-L131)
[container.State]: c4afa77157/daemon/container/state.go (L15-L23)
[container.Container]: c4afa77157/daemon/container/container.go (L67-L75)
[container.SetRunning]: c4afa77157/daemon/container/state.go (L230-L277)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-09-19 16:02:14 +02:00
Derek McGowan
f74e5d48b3 Create github.com/moby/moby/v2 module
Signed-off-by: Derek McGowan <derek@mcg.dev>
2025-07-31 10:13:29 -07:00
Sebastiaan van Stijn
83510a26b3 api/types: move backend types to daemon/server
The "backend" types in API were designed to decouple the API server
implementation from the daemon, or other parts of the code that
back the API server. This would allow the daemon to evolve (e.g.
functionality moved to different subsystems) without that impacting
the API server's implementation.

Now that the API server is no longer part of the API package (module),
there is no benefit to having it in the API module. The API server
may evolve (and require changes in the backend), which has no direct
relation with the API module (types, responses); the backend definition
is, however, coupled to the API server implementation.

It's worth noting that, while "technically" possible to use the API
server package, and implement an alternative backend implementation,
this has never been a prime objective. The backend definition was
never considered "stable", and we don't expect external users to
(attempt) to use it as such.

This patch moves the backend types to the daemon/server package,
so that they can evolve with the daemon and API server implementation
without that impacting the API module (which we intend to be stable,
following SemVer).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-07-28 00:03:04 +02:00
Derek McGowan
c3b0e0130a Move internal/otelutil to daemon/internal/otelutil
Signed-off-by: Derek McGowan <derek@mcg.dev>
2025-07-24 12:14:30 -07:00
Derek McGowan
afd6487b2e Create github.com/moby/moby/api module
Signed-off-by: Derek McGowan <derek@mcg.dev>
2025-07-21 09:30:05 -07:00
Derek McGowan
90f9ce14f1 Move libcontainerd to daemon/internal/libcontainerd
Signed-off-by: Derek McGowan <derek@mcg.dev>
2025-06-27 14:29:12 -07:00
Derek McGowan
5419eb1efc Move container to daemon/container
Signed-off-by: Derek McGowan <derek@mcg.dev>
2025-06-27 14:27:21 -07:00
Derek McGowan
0b2582dc8f Move internal/metrics to daemon/internal/metrics
Signed-off-by: Derek McGowan <derek@mcg.dev>
2025-06-27 14:25:45 -07:00
Sebastiaan van Stijn
5318877858 daemon: remove // import comments
These comments were added to enforce using the correct import path for
our packages ("github.com/docker/docker", not "github.com/moby/moby").
However, when working in go module mode (not GOPATH / vendor), they have
no effect, so their impact is limited.

Remove these imports in preparation of migrating our code to become an
actual go module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-05-30 15:59:13 +02:00
Albin Kerouanton
099d3ee008 daemon: containerStart: add filtered labels to OTel span
Like for containerCreate, filter the list of container labels based on
`DOCKER_OTEL_INCLUDE_CONTAINER_LABEL_ATTRS` and put that list in the
OTel span.

Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2025-04-10 19:12:25 +02:00
Derek McGowan
0aa8fe0bf9 Update to containerd v2.0.2, buildkit v0.19.0-rc2
Update buildkit version to commit which uses 2.0

Signed-off-by: Derek McGowan <derek@mcg.dev>
2025-01-15 14:09:30 +01:00
Paweł Gronowski
51c2689427 daemon/metrics: Move out to internal/metrics
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2025-01-07 14:13:06 +01:00
Rob Murray
fe856b94b5 Configure network endpoints after creating a container
For Linux, delay construction and configuration of network endpoints
until the container has been created (but not started).

Signed-off-by: Rob Murray <rob.murray@docker.com>
2024-11-05 10:01:49 +00:00
Brian Goff
2851ddc44c Add containerd image ref to created containers
This populates the "Image" field on containerd containers, but only when
using the containerd image store.
This allows containerd clients to look up the image information.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Signed-off-by: Bjorn Neergaard <bjorn.neergaard@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-06-17 14:45:17 +02:00
Albin Kerouanton
1882da852e Merge pull request #47906 from akerouanton/libnet-add-otel-spans-v3
api, daemon, libnet: Create OTel spans at various places
2024-06-14 17:03:56 +02:00
Albin Kerouanton
6c71ebd82c libcontainerd: Start: add ctx
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-06-14 15:25:07 +02:00
Albin Kerouanton
2d8c4265c7 libcontainerd: NewTask: add ctx
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-06-14 15:25:07 +02:00
Albin Kerouanton
19f72d6fc4 libnet: add more OTel spans
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-06-14 15:25:07 +02:00
Albin Kerouanton
224d7291df container: add a span to CheckpointTo
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-06-14 15:25:07 +02:00
Albin Kerouanton
cec0d50361 libnet: add ctx to Sandbox.Destroy()
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-06-13 17:13:43 +02:00
Albin Kerouanton
8dcded102e libnet: add OTel spans to CreateEndpoint
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
2024-06-13 16:45:31 +02:00
Sebastiaan van Stijn
5343c7b451 remove internal/compatcontext and use context instead
This internal package was added in f6e44bc0e8
to preserve compatibility with go1.20 and older. At the time, our vendor.mod
still had go1.18 as minimum version requirement (see [1]), which got updated to go1.20
in 16063c7456, and go1.21 in f90b03ee5d

The version of BuildKit we use already started using context.WithoutCancel,
without a fallback, so we no longer can provide compatibility with older
versions of Go, which makes our compatiblity package redundant.

This patch removes the package, and updates our code to use stdlib's context
instead.

[1]: f6e44bc0e8/vendor.mod (L7)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-06-13 13:29:39 +02:00
Paweł Gronowski
440836a8cf Merge pull request #47003 from LarsSven/fix-container-start-time
Move StartedAt time to before starting the container
2024-05-07 14:58:27 +02:00
Lars Andringa
d4f61f92fd Move StartedAt time to before starting the container
Signed-off-by: Lars Andringa <l.s.andringa@rug.nl>
Signed-off-by: LarsSven <l.s.andringa@rug.nl>

Replaced boolean parameter by IsZero check

Signed-off-by: LarsSven <l.s.andringa@rug.nl>

Separated SetRunning into two functions

Signed-off-by: LarsSven <l.s.andringa@rug.nl>

Apply suggestions from code review

Documentation fixes

Co-authored-by: Paweł Gronowski <me@woland.xyz>
Signed-off-by: LarsSven <l.s.andringa@rug.nl>
2024-03-12 16:20:21 +01:00
Sebastiaan van Stijn
8758d08bb4 api: remove handling of HostConfig on POST /containers/{id}/start (api < v1.24)
API v1.20 (Docker Engine v1.11.0) and older allowed a HostConfig to be passed
when starting a container. This feature was deprecated in API v1.21 (Docker
Engine v1.10.0) in 3e7405aea8, and removed in
API v1.23 (Docker Engine v1.12.0) in commit 0a8386c8be.

API v1.23 and older are deprecated, and this patch removes the feature.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-02-06 18:44:44 +01:00
Sebastiaan van Stijn
2970b320aa api: remove code for adjusting CPU shares (api < v1.19)
API versions before 1.19 allowed CpuShares that were greater than the maximum
or less than the minimum supported by the kernel, and relied on the kernel to
do the right thing.

Commit ed39fbeb2a introduced code to adjust the
CPU shares to be within the accepted range when using API version 1.18 or
lower.

API v1.23 and older are deprecated, so we can remove support for this
functionality.

Currently, there's no validation for CPU shares to be within an acceptable
range; a TODO was added to add validation for this option, and to use the
`linuxMinCPUShares` and `linuxMaxCPUShares` consts for this.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-02-06 18:44:33 +01:00
Cory Snider
0046b16d87 daemon: set libnetwork sandbox key w/o OCI hook
Signed-off-by: Cory Snider <csnider@mirantis.com>
2024-01-19 20:23:12 +00:00
Paweł Gronowski
5bbcc41c20 volumes/subpath: Plumb context
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2024-01-19 17:32:21 +01:00
Paweł Gronowski
bfb810445c volumes: Implement subpath mount
`VolumeOptions` now has a `Subpath` field which allows to specify a path
relative to the volume that should be mounted as a destination.

Symlinks are supported, but they cannot escape the base volume
directory.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2024-01-19 17:32:10 +01:00
Paweł Gronowski
f07387466a daemon/oci: Extract side effects from withMounts
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2024-01-19 17:27:16 +01:00
Cory Snider
659d7b190f libcontainerd: create unstarted tasks
Split task creation and start into two separate method calls in the
libcontainerd API. Clients now have the opportunity to inspect the
freshly-created task and customize its runtime environment before
starting execution of the user-specified binary.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2024-01-10 13:50:26 -05:00
Sebastiaan van Stijn
484e6b784c api/types: move ContainerCreateConfig, ContainerRmConfig to api/types/backend
The `ContainerCreateConfig` and `ContainerRmConfig` structs are used for
options to be passed to the backend, and are not used in client code.

Thess struct currently is intended for internal use only (for example, the
`AdjustCPUShares` is an internal implementation details to adjust the container's
config when older API versions are used).

Somewhat ironically, the signature of the Backend has a nicer UX than that
of the client's `ContainerCreate` signature (which expects all options to
be passed as separate arguments), so we may want to update that signature
to be closer to what the backend is using, but that can be left as a future
exercise.

This patch moves the `ContainerCreateConfig` and `ContainerRmConfig` structs
to the backend package to prevent it being imported in the client, and to make
it more clear that this is part of internal APIs, and not public-facing.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-12-05 16:41:36 +01:00
Sebastiaan van Stijn
cff4f20c44 migrate to github.com/containerd/log v0.1.0
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.

This patch moves our own uses of the package to use the new module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-10-11 17:52:23 +02:00
Sebastiaan van Stijn
0f871f8cb7 api/types/events: define "Action" type and consts
Define consts for the Actions we use for events, instead of "ad-hoc" strings.
Having these consts makes it easier to find where specific events are triggered,
makes the events less error-prone, and allows documenting each Action (if needed).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-29 00:38:08 +02:00
Sebastiaan van Stijn
80d158e0de daemon: remove containerNotModifiedError
Removing this type, because:

- containerNotModifiedError is not an actual error, and abstracting it away
  was hiding some of these details. It also wasn't used as a sentinel error
  anywhere, so doesn't have to be its own type.
- Defining a type just to toggle the error-message between "not running"
  and "not stopped" felt a bit over-the-top, as each variant was only used once.
- So  "it only had one job", and it didn't even do that right; it produced
  capitalized error messages, which makes linters unhappy.

So, let's just inline what it does in the two places it was used.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-11 21:28:06 +02:00
Sebastiaan van Stijn
dffe634c19 daemon: Daemon.ContainerStart(): make validateState a regular function
There's no need for this to be a closure; let's just make it a regular
function. While moving it out, also make some minor code-changes and
add some code-comments to describe the flow / intent, which may not
be trivial for people that are not familiar with these details.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-11 21:28:06 +02:00
Sebastiaan van Stijn
bf1fb97575 daemon: Daemon.containerStart(): add comment to clarify error-type
Any error that occurs while creating the spec, even if it's the
result of an invalid container config, must be considered a System
error (internal server error), as it's not an error with the request
to start the container.

Invalid configuration in the config itself must be validated when
creating the container (creating its config), but some errors are
dependent on the current state, for example when starting a container
that shares a namespace with another container, and that container
is not running (or missing).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-11 14:47:22 +02:00
Brian Goff
74da6a6363 Switch all logging to use containerd log pkg
This unifies our logging and allows us to propagate logging and trace
contexts together.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-06-24 00:23:44 +00:00
Djordje Lukic
32d58144fd c8d: Use reference counting while mounting a snapshot
Some snapshotters (like overlayfs or zfs) can't mount the same
directories twice. For example if the same directroy is used as an upper
directory in two mounts the kernel will output this warning:

    overlayfs: upperdir is in-use as upperdir/workdir of another mount, accessing files from both mounts will result in undefined behavior.

And indeed accessing the files from both mounts will result in an "No
such file or directory" error.

This change introduces reference counts for the mounts, if a directory
is already mounted the mount interface will only increment the mount
counter and return the mount target effectively making sure that the
filesystem doesn't end up in an undefined behavior.

Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>
2023-06-07 15:50:01 +02:00
Cory Snider
d222bf097c daemon: reload runtimes w/o breaking containers
The existing runtimes reload logic went to great lengths to replace the
directory containing runtime wrapper scripts as atomically as possible
within the limitations of the Linux filesystem ABI. Trouble is,
atomically swapping the wrapper scripts directory solves the wrong
problem! The runtime configuration is "locked in" when a container is
started, including the path to the runC binary. If a container is
started with a runtime which requires a daemon-managed wrapper script
and then the daemon is reloaded with a config which no longer requires
the wrapper script (i.e. some args -> no args, or the runtime is dropped
from the config), that container would become unmanageable. Any attempts
to stop, exec or otherwise perform lifecycle management operations on
the container are likely to fail due to the wrapper script no longer
existing at its original path.

Atomically swapping the wrapper scripts is also incompatible with the
read-copy-update paradigm for reloading configuration. A handler in the
daemon could retain a reference to the pre-reload configuration for an
indeterminate amount of time after the daemon configuration has been
reloaded and updated. It is possible for the daemon to attempt to start
a container using a deleted wrapper script if a request to run a
container races a reload.

Solve the problem of deleting referenced wrapper scripts by ensuring
that all wrapper scripts are *immutable* for the lifetime of the daemon
process. Any given runtime wrapper script must always exist with the
same contents, no matter how many times the daemon config is reloaded,
or what changes are made to the config. This is accomplished by using
everyone's favourite design pattern: content-addressable storage. Each
wrapper script file name is suffixed with the SHA-256 digest of its
contents to (probabilistically) guarantee immutability without needing
any concurrency control. Stale runtime wrapper scripts are only cleaned
up on the next daemon restart.

Split the derived runtimes configuration from the user-supplied
configuration to have a place to store derived state without mutating
the user-supplied configuration or exposing daemon internals in API
struct types. Hold the derived state and the user-supplied configuration
in a single struct value so that they can be updated as an atomic unit.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:25 -04:00
Cory Snider
0b592467d9 daemon: read-copy-update the daemon config
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2023-06-01 14:45:24 -04:00
Djordje Lukic
0137446248 Implement run using the containerd snapshotter
Signed-off-by: Djordje Lukic <djordje.lukic@docker.com>

c8d/daemon: Mount root and fill BaseFS

This fixes things that were broken due to nil BaseFS like `docker cp`
and running a container with workdir override.

This is more of a temporary hack than a real solution.
The correct fix would be to refactor the code to make BaseFS and LayerRW
an implementation detail of the old image store implementation and use
the temporary mounts for the c8d implementation instead.
That requires more work though.

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>

daemon/images: Don't unset BaseFS

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2023-02-06 18:21:50 +01:00
Sebastiaan van Stijn
42f1be8030 daemon: translateContainerdStartErr(): rename to setExitCodeFromError()
This should hopefully make it slightly clearer what it does.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-28 09:27:42 +01:00
Sebastiaan van Stijn
a756fa60ef daemon: translateContainerdStartErr(): use const/enum for exit-statuses
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-28 09:27:41 +01:00
Sebastiaan van Stijn
2cf09c5446 daemon: translateContainerdStartErr(): remove unused cmd argument
This argument was no longer used since commit 225e046d9d

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-28 09:27:41 +01:00
Sebastiaan van Stijn
087369aeeb daemon: containerStart(): rename return variable
Rename the variable make it more visible where it's used, as there's were
other "err" variables masking it.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-12-28 09:27:37 +01:00
Cory Snider
0141c6db81 daemon: don't checkpoint container until registered
(*Container).CheckpointTo() upserts a snapshot of the container to the
daemon's in-memory ViewDB and also persists the snapshot to disk. It
does not register the live container object with the daemon's container
store, however. The ViewDB and container store are used as the source of
truth for different operations, so having a container registered in one
but not the other can result in inconsistencies. In particular, the List
Containers API uses the ViewDB as its source of truth and the Container
Inspect API uses the container store.

The (*Daemon).setHostConfig() method is called fairly early in the
process of creating a container, long before the container is registered
in the daemon's container store. Due to a rogue CheckpointTo() call
inside setHostConfig(), there is a window of time where a container can
be included in a List Containers API response but "not exist" according
to the Container Inspect API and similar endpoints which operate on a
particular container. Remove the rogue call so that the caller has full
control over when the container is checkpointed and update callers to
checkpoint explicitly. No changes to (*Daemon).create() are needed as it
checkpoints the fully-created container via (*Daemon).Register().

Fixes #44512.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-12-12 15:53:49 -05:00
Paweł Gronowski
a181a825c8 daemon/start: Revert passing ctx to ctr.Start
This caused integration tests to timeout in the CI

Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2022-11-03 12:22:44 +01:00
Nicolas De Loof
def549c8f6 imageservice: Add context to various methods
Co-authored-by: Paweł Gronowski <pawel.gronowski@docker.com>
Signed-off-by: Paweł Gronowski <pawel.gronowski@docker.com>
2022-11-03 12:22:40 +01:00