When running the docker daemon with `--selinux-enabled`, access to the docker
socket is prevented by SELinux. To access the socket, the container must be
started with `--privileged`, with SELinux disabled (`--security-opt label=disable`),
or with (e.g.) `--security-opt label=type:container_runtime_t`, which gives
it access to files restricted to the runtime ( `dockerd` daemon) itself.
While having access to the docker socket grants full `root` permissions on
the host (e.g. through starting a privileged container using the socket),
it may be preferable to restrict the container to just the socket.
This patch adds a `docker_client.process` SELinux CIL policy module that
defines a container domain (process type). It inherits the base container
template and grants the permissions needed to use the docker socket.
Without this (and the daemon running with `--selinux-enabled`);
docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock docker:cli -H unix:///var/run/docker.sock version
Client:
Version: 28.4.0
API version: 1.51
Go version: go1.24.7
Git commit: d8eb465
Built: Wed Sep 3 20:56:28 2025
OS/Arch: linux/amd64
Context: default
permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.51/version": dial unix /var/run/docker.sock: connect: permission denied
With this:
semodule -i /usr/share/udica/templates/base_container.cil
semodule -i ./contrib/selinux/docker_client.cil
docker run -it --rm -v /var/run/docker.sock:/var/run/docker.sock --security-opt label=type:docker_client.process docker:cli -H unix:///var/run/docker.sock version
Client:
Version: 28.4.0
API version: 1.51
Go version: go1.24.7
Git commit: d8eb465
Built: Wed Sep 3 20:56:28 2025
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 28.4.0
API version: 1.51 (minimum version 1.24)
Go version: go1.24.7
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
full diff: 2e043c6bd6...0ea5ed0382
Notable changes:
- Revert "Change /dev to be mounted by default with /noexec". Mounting /dev
with 'noexec' option triggers problems when containers try to create Intel
SGX enclaves: [runtime-tools@0524bb2]
- Switch to github.com/moby/sys/capability [runtime-tools@c2dadba]
[runtime-tools@0524bb2]: 0524bb2cf6
[runtime-tools@c2dadba]: c2dadba13f
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Allow tests to run in parallel with separate network namespaces,
without modifying the global-state namespace/netlink handles in
the "ns" package ... only useful for tests that don't depend on
package "ns".
Use the new option in iptabler/nftabler tests.
Signed-off-by: Rob Murray <rob.murray@docker.com>
The `BridgeNfIptables` and `BridgeNfIp6tables` were removed in API v1.50
in commit 6505d3877c, and only returned in
lower API versions.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
SetupTestOSContextEx calls 'ns.Init' (which, outside tests, is protected
by a sync.Once), and it's called again by the returned OSContext.Cleanup
method. That overwrites the ns package's namespace and netlink handles
(initNs and initNl) without closing them.
Because SetupTestOSContextEx changes that shared state, it should not
be used in parallel tests. So, rather than trying to close the handles
in ns.Init if already open - un-export Init so it's always called via
its sync.Once, and add a reset function for tests to use. Have
SetupTestOSContextEx claim a mutex to avoid crashy surprises or
hard to catch issues where the ns package isn't using the expected
namespace if it is used in parallel tests.
Signed-off-by: Rob Murray <rob.murray@docker.com>
The PushImage method for the ImageService used positional arguments for its
options, which made it more difficult to introduce new options. This patch
introduces a `PushOptions` struct to specify the options. As part of these
changes, the `platform` option was already adjusted to accept a slice of
platforms, which currently is not supported, but may be in the near future.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The PullImage method for the ImageService used positional arguments for its
options, which made it more difficult to introduce new options. This patch
introduces a `PullOptions` struct to specify the options. As part of these
changes, the `platform` option was already adjusted to accept a slice of
platforms, which currently is not supported, but may be in the near future.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Mirantis Container Runtime 23.0 reached EOL on May 19, and the 23.0
branch is no longer maintained.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Use "apiClient" for the client (most places use either `apiClient`
or `c`) to prevent shadowing the `client` import.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Commit [moby@17d870b] (API v1.13, docker v1.1.0) changed the default to pause
containers during commit, keeping the behavior opt-in for older API versions.
This version-gate was removed in [moby@1b1147e] because API versions lower
than v1.23 were no longer supported.
However, the `CreateImageConfig` struct still used `Pause`, and required opting-
in to enable pausing. This patch changes the struct to reflect the default.
after this change, we should also consider changing the API make disabling
pause a more explicit option, and to change the "pause" argument to a
"no-pause".
[moby@17d870b]: 17d870bed5
[moby@1b1147e]: 1b1147e46b
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Commit [moby@17d870b] (API v1.13, docker v1.1.0) changed the default to pause
containers during commit, keeping the behavior opt-in for older API versions.
This version-gate was removed in [moby@1b1147e] because API versions lower
than v1.23 were no longer supported.
However, the client still required opting-in to pausing containers, which
is handled by setting the `Pause` field to true by default. This patch changes
the client option to reflect the default; after this change, we should also
consider changing the API make disabling pause a more explicit option, and
to change the "pause" argument to a "no-pause".
[moby@17d870b]: 17d870bed5
[moby@1b1147e]: 1b1147e46b
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Trim any v-prefix passed to this function to make sure we detect empty
API versions.
In most cases, the ping-response will originate from the API server, but
the exported `Client.NegotiateAPIVersionPing` allows a ping-response to
be passed manually.
While updating, also update the signature to only accept the version, as
only the `PingResponse.APIVersion` is used by this function.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Validate that the client is connecting with the expected endpoint path and
method(s). Also fix the Api-Version response to align with the actual format
returned, which doesn't include a "v" prefix;
curl -sI --unix-socket /var/run/docker.sock 'http://localhost/_ping' | grep 'Api-Version'
Api-Version: 1.51
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This method was introduced in [moby@5a84124] related to the (now removed)
support for "compose on kubernetes" in the CLI. This functionality extended
the CLI with endpoints that are not part of the engine API, but re-using
the HTTP-client with the same (TLS) config as the CLI itself.
While such scenarios may be something to consider in future (i.e. more easily
extend the API with custom endpoints), this method is not currently used,
but defined as part of the CLI's interface. This patch removes the method
for now, so that we can design from a clean slate in case we need this
extensibility, instead of keeping methods that were added ad-hoc around.
[moby@5a84124]: 5a84124739
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This method did not provide any special handling for accessing the
field, and did not handle locking. Let's remove it for now to
not pretend we're doing anything more safe than directly accessing
the field.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This constructor did not do anything other than creating an empty struct
for an exported type. While we should look at initializing with a proper
state, we currently do not, so let's not pretend we do some magic here,
and leave it for a future exercise to create a proper constructor if we
need one.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The State.Health struct has a mutex, but in various places
we access the embedded Health struct directly.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The Container.State struct holds the container's state, and most of
its fields are expected to change dynamically. Some o these state-changes
are explicit, for example, setting the container to be "stopped". Other
state changes can be more explicit, for example due to the containers'
process exiting or being "OOM" killed by the kernel.
The distinction between explicit ("desired") state changes and "state"
("actual state") is sometimes vague; for some properties, we clearly
separated them, for example if a user requested the container to be
stopped or restarted, we store state in the Container object itself;
HasBeenManuallyStopped bool // used for unless-stopped restart policy
HasBeenManuallyRestarted bool `json:"-"` // used to distinguish restart caused by restart policy from the manual one
Other properties are more ambiguous. such as "HasBeenStartedBefore" and
"RestartCount", which are stored on the Container (and persisted to
disk), but may be more related to "actual" state, and likely should
not be persisted;
RestartCount int
HasBeenStartedBefore bool
Given that (per the above) concurrency must be taken into account, most
changes to the `container.State` struct should be protected; here's where
things get blurry. While the `State` type provides various accessor methods,
only some of them take concurrency into account; for example, [State.IsRunning]
and [State.GetPID] acquire a lock, whereas [State.ExitCodeValue] does not.
Even the (commonly used) [State.StateString] has no locking at all.
The way to handle this is error-prone; [container.State] contains a mutex,
and it's exported. Given that its embedded in the [container.Container]
struct, it's also exposed as an exported mutex for the container. The
assumption here is that by "merging" the two, the caller to acquire a lock
when either the container _or_ its state must be mutated. However, because
some methods on `container.State` handle their own locking, consumers must
be deeply familiar with the internals; if both changes to the `Container`
AND `Container.State` must be made. This gets amplified more as some
(exported!) methods, such as [container.SetRunning] mutate multiple fields,
but don't acquire a lock (so expect the caller to hold one), but their
(also exported) counterpart (e.g. [State.IsRunning]) do.
It should be clear from the above, that this needs some architectural
changes; a clearer separation between "desired" and "actual" state (opening
the potential to update the container's config without manually touching
its `State`), possibly a method to obtain a read-only copy of the current
state (for those querying state), and reviewing which fields belong where
(and should be persisted to disk, or only remain in memory).
This PR preserves the status quo; it makes no structural changes, other
than exposing where we access the container's state. Where previously the
State fields and methods were referred to as "part of the container"
(e.g. `ctr.IsRunning()` or `ctr.Running`), we now explicitly reference
the embedded `State` (`ctr.State.IsRunning`, `ctr.State.Running`).
The exception (for now) is the mutex, which is still referenced through
the embedded struct (`ctr.Lock()` instead of `ctr.State.Lock()`), as this
is (mostly) by design to protect the container, and what's in it (including
its `State`).
[State.IsRunning]: c4afa77157/daemon/container/state.go (L205-L209)
[State.GetPID]: c4afa77157/daemon/container/state.go (L211-L216)
[State.ExitCodeValue]: c4afa77157/daemon/container/state.go (L218-L228)
[State.StateString]: c4afa77157/daemon/container/state.go (L102-L131)
[container.State]: c4afa77157/daemon/container/state.go (L15-L23)
[container.Container]: c4afa77157/daemon/container/container.go (L67-L75)
[container.SetRunning]: c4afa77157/daemon/container/state.go (L230-L277)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>