Files
moby/daemon/delete.go
Sebastiaan van Stijn 0df791cb72 explicitly access Container.State instead of through embedded struct
The Container.State struct holds the container's state, and most of
its fields are expected to change dynamically. Some o these state-changes
are explicit, for example, setting the container to be "stopped". Other
state changes can be more explicit, for example due to the containers'
process exiting or being "OOM" killed by the kernel.

The distinction between explicit ("desired") state changes and "state"
("actual state") is sometimes vague; for some properties, we clearly
separated them, for example if a user requested the container to be
stopped or restarted, we store state in the Container object itself;

    HasBeenManuallyStopped   bool // used for unless-stopped restart policy
    HasBeenManuallyRestarted bool `json:"-"` // used to distinguish restart caused by restart policy from the manual one

Other properties are more ambiguous. such as "HasBeenStartedBefore" and
"RestartCount", which are stored on the Container (and persisted to
disk), but may be more related to "actual" state, and likely should
not be persisted;

    RestartCount             int
    HasBeenStartedBefore     bool

Given that (per the above) concurrency must be taken into account, most
changes to the `container.State` struct should be protected; here's where
things get blurry. While the `State` type provides various accessor methods,
only some of them take concurrency into account; for example, [State.IsRunning]
and [State.GetPID] acquire a lock, whereas [State.ExitCodeValue] does not.
Even the (commonly used) [State.StateString] has no locking at all.

The way to handle this is error-prone; [container.State] contains a mutex,
and it's exported. Given that its embedded in the [container.Container]
struct, it's also exposed as an exported mutex for the container. The
assumption here is that by "merging" the two, the caller to acquire a lock
when either the container _or_ its state must be mutated. However, because
some methods on `container.State` handle their own locking, consumers must
be deeply familiar with the internals; if both changes to the `Container`
AND `Container.State` must be made. This gets amplified more as some
(exported!) methods, such as [container.SetRunning] mutate multiple fields,
but don't acquire a lock (so expect the caller to hold one), but their
(also exported) counterpart (e.g. [State.IsRunning]) do.

It should be clear from the above, that this needs some architectural
changes; a clearer separation between "desired" and "actual" state (opening
the potential to update the container's config without manually touching
its `State`), possibly a method to obtain a read-only copy of the current
state (for those querying state), and reviewing which fields belong where
(and should be persisted to disk, or only remain in memory).

This PR preserves the status quo; it makes no structural changes, other
than exposing where we access the container's state. Where previously the
State fields and methods were referred to as "part of the container"
(e.g. `ctr.IsRunning()` or `ctr.Running`), we now explicitly reference
the embedded `State` (`ctr.State.IsRunning`, `ctr.State.Running`).

The exception (for now) is the mutex, which is still referenced through
the embedded struct (`ctr.Lock()` instead of `ctr.State.Lock()`), as this
is (mostly) by design to protect the container, and what's in it (including
its `State`).

[State.IsRunning]: c4afa77157/daemon/container/state.go (L205-L209)
[State.GetPID]: c4afa77157/daemon/container/state.go (L211-L216)
[State.ExitCodeValue]: c4afa77157/daemon/container/state.go (L218-L228)
[State.StateString]: c4afa77157/daemon/container/state.go (L102-L131)
[container.State]: c4afa77157/daemon/container/state.go (L15-L23)
[container.Container]: c4afa77157/daemon/container/container.go (L67-L75)
[container.SetRunning]: c4afa77157/daemon/container/state.go (L230-L277)

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-09-19 16:02:14 +02:00

183 lines
6.2 KiB
Go

package daemon
import (
"context"
"fmt"
"os"
"path"
"strings"
"time"
"github.com/containerd/log"
"github.com/moby/moby/api/types/events"
"github.com/moby/moby/v2/daemon/config"
"github.com/moby/moby/v2/daemon/container"
"github.com/moby/moby/v2/daemon/internal/containerfs"
"github.com/moby/moby/v2/daemon/internal/metrics"
"github.com/moby/moby/v2/daemon/server/backend"
"github.com/moby/moby/v2/errdefs"
"github.com/opencontainers/selinux/go-selinux"
"github.com/pkg/errors"
)
// ContainerRm removes the container id from the filesystem. An error
// is returned if the container is not found, or if the remove
// fails. If the remove succeeds, the container name is released, and
// network links are removed.
func (daemon *Daemon) ContainerRm(name string, config *backend.ContainerRmConfig) error {
return daemon.containerRm(&daemon.config().Config, name, config)
}
func (daemon *Daemon) containerRm(cfg *config.Config, name string, opts *backend.ContainerRmConfig) error {
start := time.Now()
ctr, err := daemon.GetContainer(name)
if err != nil {
return err
}
// Container state RemovalInProgress should be used to avoid races.
if inProgress := ctr.State.SetRemovalInProgress(); inProgress {
err := fmt.Errorf("removal of container %s is already in progress", name)
return errdefs.Conflict(err)
}
defer ctr.State.ResetRemovalInProgress()
// check if container wasn't deregistered by previous rm since Get
if c := daemon.containers.Get(ctr.ID); c == nil {
return nil
}
if opts.RemoveLink {
return daemon.rmLink(cfg, ctr, name)
}
err = daemon.cleanupContainer(ctr, *opts)
metrics.ContainerActions.WithValues("delete").UpdateSince(start)
if err != nil {
return fmt.Errorf("cannot remove container %q: %w", name, err)
}
return nil
}
func (daemon *Daemon) rmLink(cfg *config.Config, ctr *container.Container, name string) error {
if name[0] != '/' {
name = "/" + name
}
parent, n := path.Split(name)
if parent == "/" {
return errors.New("Conflict, cannot remove the default link name of the container")
}
parent = strings.TrimSuffix(parent, "/")
parentID, err := daemon.containersReplica.Snapshot().GetID(parent)
if err != nil {
return fmt.Errorf("Cannot get parent %s for link name %s", parent, name)
}
daemon.releaseName(name)
if parentContainer := daemon.containers.Get(parentID); parentContainer != nil {
daemon.linkIndex.unlink(name, ctr, parentContainer)
if err := daemon.updateNetwork(cfg, parentContainer); err != nil {
log.G(context.TODO()).Debugf("Could not update network to remove link %s: %v", n, err)
}
}
return nil
}
// cleanupContainer unregisters a container from the daemon, stops stats
// collection and cleanly removes contents and metadata from the filesystem.
func (daemon *Daemon) cleanupContainer(ctr *container.Container, config backend.ContainerRmConfig) error {
if ctr.State.IsRunning() {
if !config.ForceRemove {
if ctr.State.Paused {
return errdefs.Conflict(errors.New("container is paused and must be unpaused first"))
} else {
return errdefs.Conflict(fmt.Errorf("container is %s: stop the container before removing or force remove", ctr.State.StateString()))
}
}
if err := daemon.Kill(ctr); err != nil && !isNotRunning(err) {
return fmt.Errorf("could not kill container: %w", err)
}
}
// stop collection of stats for the container regardless
// if stats are currently getting collected.
daemon.statsCollector.StopCollection(ctr)
// stopTimeout is the number of seconds to wait for the container to stop
// gracefully before forcibly killing it.
//
// Why 3 seconds? The timeout specified here was originally added in commit
// 1615bb08c7c3fc6c4b22db0a633edda516f97cf0, which added a custom timeout to
// some commands, but lacking an option for a timeout on "docker rm", was
// hardcoded to 10 seconds. Commit 28fd289b448164b77affd8103c0d96fd8110daf9
// later on updated this to 3 seconds (but no background on that change).
//
// If you arrived here and know the answer, you earned yourself a picture
// of a cute animal of your own choosing.
stopTimeout := 3
if err := daemon.containerStop(context.TODO(), ctr, backend.ContainerStopOptions{Timeout: &stopTimeout}); err != nil {
return err
}
// Mark container dead. We don't want anybody to be restarting it.
ctr.Lock()
ctr.State.Dead = true
// Copy RWLayer for releasing and clear the reference while holding the container lock.
rwLayer := ctr.RWLayer
ctr.RWLayer = nil
// Save container state to disk. So that if error happens before
// container meta file got removed from disk, then a restart of
// docker should not make a dead container alive.
if err := ctr.CheckpointTo(context.WithoutCancel(context.TODO()), daemon.containersReplica); err != nil && !os.IsNotExist(err) {
log.G(context.TODO()).Errorf("Error saving dying container to disk: %v", err)
}
ctr.Unlock()
// When container creation fails and `RWLayer` has not been created yet, we
// do not call `ReleaseRWLayer`
if rwLayer != nil {
if err := daemon.imageService.ReleaseLayer(rwLayer); err != nil {
// Restore the reference on error as it possibly was not released.
ctr.Lock()
ctr.RWLayer = rwLayer
ctr.Unlock()
ctr.State.SetRemovalError(err)
return err
}
}
// Hold the container lock while deleting the container root directory
// so that other goroutines don't attempt to concurrently open files
// within it. Having any file open on Windows (without the
// FILE_SHARE_DELETE flag) will block it from being deleted.
//
// TODO(thaJeztah): should this be moved to the "container" itself, or possibly be delegated to the graphdriver or snapshotter?
ctr.Lock()
err := containerfs.EnsureRemoveAll(ctr.Root)
ctr.Unlock()
if err != nil {
err = errors.Wrap(err, "unable to remove filesystem")
ctr.State.SetRemovalError(err)
return err
}
linkNames := daemon.linkIndex.delete(ctr)
selinux.ReleaseLabel(ctr.ProcessLabel)
daemon.containers.Delete(ctr.ID)
daemon.containersReplica.Delete(ctr)
if err := daemon.removeMountPoints(ctr, config.RemoveVolume); err != nil {
log.G(context.TODO()).Error(err)
}
for _, name := range linkNames {
daemon.releaseName(name)
}
ctr.State.SetRemoved()
metrics.StateCtr.Delete(ctr.ID)
daemon.LogContainerEvent(ctr, events.ActionDestroy)
return nil
}