The Container.State struct holds the container's state, and most of
its fields are expected to change dynamically. Some o these state-changes
are explicit, for example, setting the container to be "stopped". Other
state changes can be more explicit, for example due to the containers'
process exiting or being "OOM" killed by the kernel.
The distinction between explicit ("desired") state changes and "state"
("actual state") is sometimes vague; for some properties, we clearly
separated them, for example if a user requested the container to be
stopped or restarted, we store state in the Container object itself;
HasBeenManuallyStopped bool // used for unless-stopped restart policy
HasBeenManuallyRestarted bool `json:"-"` // used to distinguish restart caused by restart policy from the manual one
Other properties are more ambiguous. such as "HasBeenStartedBefore" and
"RestartCount", which are stored on the Container (and persisted to
disk), but may be more related to "actual" state, and likely should
not be persisted;
RestartCount int
HasBeenStartedBefore bool
Given that (per the above) concurrency must be taken into account, most
changes to the `container.State` struct should be protected; here's where
things get blurry. While the `State` type provides various accessor methods,
only some of them take concurrency into account; for example, [State.IsRunning]
and [State.GetPID] acquire a lock, whereas [State.ExitCodeValue] does not.
Even the (commonly used) [State.StateString] has no locking at all.
The way to handle this is error-prone; [container.State] contains a mutex,
and it's exported. Given that its embedded in the [container.Container]
struct, it's also exposed as an exported mutex for the container. The
assumption here is that by "merging" the two, the caller to acquire a lock
when either the container _or_ its state must be mutated. However, because
some methods on `container.State` handle their own locking, consumers must
be deeply familiar with the internals; if both changes to the `Container`
AND `Container.State` must be made. This gets amplified more as some
(exported!) methods, such as [container.SetRunning] mutate multiple fields,
but don't acquire a lock (so expect the caller to hold one), but their
(also exported) counterpart (e.g. [State.IsRunning]) do.
It should be clear from the above, that this needs some architectural
changes; a clearer separation between "desired" and "actual" state (opening
the potential to update the container's config without manually touching
its `State`), possibly a method to obtain a read-only copy of the current
state (for those querying state), and reviewing which fields belong where
(and should be persisted to disk, or only remain in memory).
This PR preserves the status quo; it makes no structural changes, other
than exposing where we access the container's state. Where previously the
State fields and methods were referred to as "part of the container"
(e.g. `ctr.IsRunning()` or `ctr.Running`), we now explicitly reference
the embedded `State` (`ctr.State.IsRunning`, `ctr.State.Running`).
The exception (for now) is the mutex, which is still referenced through
the embedded struct (`ctr.Lock()` instead of `ctr.State.Lock()`), as this
is (mostly) by design to protect the container, and what's in it (including
its `State`).
[State.IsRunning]: c4afa77157/daemon/container/state.go (L205-L209)
[State.GetPID]: c4afa77157/daemon/container/state.go (L211-L216)
[State.ExitCodeValue]: c4afa77157/daemon/container/state.go (L218-L228)
[State.StateString]: c4afa77157/daemon/container/state.go (L102-L131)
[container.State]: c4afa77157/daemon/container/state.go (L15-L23)
[container.Container]: c4afa77157/daemon/container/container.go (L67-L75)
[container.SetRunning]: c4afa77157/daemon/container/state.go (L230-L277)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The environment variables set by legacy links are not particularly
useful because you need to know the name of the linked container to use
them, or you need to scan all enviornment variables to find them.
Legacy links are deprecated / marked "legacy" since a long time, and we
want to replace them with non-legacy links. This will help make the
default bridge work like custom networks.
For now, stop setting these environment variables inside of linking
containers by default, but provide an escape hatch to allow users who
still rely on these to re-enable them.
The integration-cli tests `TestExecEnvLinksHost` and `TestLinksEnvs` are
removed as they need to run against a daemon with legacy links env vars
enabled, and a new integration test`TestLegacyLinksEnvVars` is added to
fill the gap. Similarly, the docker-py test `test_create_with_links` is
skipped.
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
The buildSandboxPlatformOptions function was given a pointer to the
sboxOptions and modified it in-place.
Similarly, a pointer to the container was passed and `container.HostsPath`
and `container.ResolvConfPath` mutated. In cases where either of those
failed, we would return an error, but the container (and sboxOptions)
would already be modified.
This patch;
- updates the signature of buildSandboxPlatformOptions to return a fresh
slice of sandbox options, which can be appended to the sboxOptions by
the caller.
- uses intermediate variables for `hostsPath` and `resolvConfPath`, and
only mutates the container if both were obtained successfully.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The stringid package is used in many places; while it's trivial
to implement a similar utility, let's just provide it as a utility
package in the client, removing the daemon-specific logic.
For integration tests, I opted to use the implementation in the
client, as those should not ideally not make assumptions about
the daemon implementation.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
These comments were added to enforce using the correct import path for
our packages ("github.com/docker/docker", not "github.com/moby/moby").
However, when working in go module mode (not GOPATH / vendor), they have
no effect, so their impact is limited.
Remove these imports in preparation of migrating our code to become an
actual go module.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This function was unconditionally trying to fetch linked container, even
if the container was not using the default bridge (the only network that
supports legacy links).
Also removing the intermediate variable, as daemon.children, through
daemon.linkindex.children already returns a variable with a copy of these
links.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
If config for legacy links needs to be added to a libnetwork.Sandbox,
add it when constructing the Endpoint that needs it - removing the
constraint on ordering of Endpoint construction, and the dependency
between Endpoint and Sandbox construction.
So, now a Sandbox can be constructed in one place, before the first
Endpoint.
Signed-off-by: Rob Murray <rob.murray@docker.com>
This reverts commit 18f4f775ed.
Because buildkit doesn't run an internal resolver, and it bases its
/etc/resolv.conf on the host's ... when buildkit is run in a container
that has 'nameserver 127.0.0.11', its build containers will use Google's
DNS servers as a fallback (unless the build container uses host
networking).
Before, when the 127.0.0.11 resolver was not used for the default network,
the buildkit container would have inherited a site-local nameserver. So,
the build containers it created would also have inherited that DNS
server - and they'd be able to resolve site-local hostnames.
By replacing the site-local nameserver with Google's, we broke access
to local DNS and its hostnames.
Signed-off-by: Rob Murray <rob.murray@docker.com>
This function returns the default network to use for the daemon platform;
moving this to a location separate from runconfig, which is planned to
be dismantled and moved to the API.
While it might be convenient to move this utility inside api/types/container,
we don't want to advertise this function too widely, as the default returned
can ONLY be considered correct when ran on the daemon-side. An alternative
would be to introduce an argument (daemonPlatform), which isn't very convenient
to use.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Not a full list yet, but renaming to prevent shadowing, and to use a more
consistent short form (ctr for container).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Until now, containers on the default bridge network have been configured
to talk directly to external DNS servers - their resolv.conf files have
either been populated with nameservers from the host's resolv.conf, or
with servers from '--dns' (or with Google's nameservers as a fallback).
This change makes the internal bridge more like other networks by using
the internal resolver. But, the internal resolver is not populated with
container names or aliases - it's only for external DNS lookups.
Containers on the default network, on a host that has a loopback
resolver (like systemd's on 127.0.0.53) will now use that resolver
via the internal resolver. So, the logic used to find systemd's current
set of resolvers is no longer needed by the daemon.
Legacy links work just as they did before, using '/etc/hosts' and magic.
(Buildkit does not use libnetwork, so it can't use the internal resolver.
But it does use libnetwork/resolvconf's logic to configure resolv.conf.
So, code to set up resolv.conf for a legacy networking without an internal
resolver can't be removed yet.)
Signed-off-by: Rob Murray <rob.murray@docker.com>
Make the internal DNS resolver for Windows containers forward requests
to upsteam DNS servers when it cannot respond itself, rather than
returning SERVFAIL.
Windows containers are normally configured with the internal resolver
first for service discovery (container name lookup), then external
resolvers from '--dns' or the host's networking configuration.
When a tool like ping gets a SERVFAIL from the internal resolver, it
tries the other nameservers. But, nslookup does not, and with this
change it does not need to.
The internal resolver learns external server addresses from the
container's HNSEndpoint configuration, so it will use the same DNS
servers as processes in the container.
The internal resolver for Windows containers listens on the network's
gateway address, and each container may have a different set of external
DNS servers. So, the resolver uses the source address of the DNS request
to select external resolvers.
On Windows, daemon.json feature option 'windows-no-dns-proxy' can be used
to prevent the internal resolver from forwarding requests (restoring the
old behaviour).
Signed-off-by: Rob Murray <rob.murray@docker.com>
Replace regex matching/replacement and re-reading of generated files
with a simple parser, and struct to remember and manipulate the file
content.
Annotate the generated file with a header comment saying the file is
generated, but can be modified, and a trailing comment describing how
the file was generated and listing external nameservers.
Always start with the host's resolv.conf file, whether generating config
for host networking, or with/without an internal resolver - rather than
editing a file previously generated for a different use-case.
Resolves an issue where rewrites of the generated file resulted in
default IPv6 nameservers being unnecessarily added to the config.
Signed-off-by: Rob Murray <rob.murray@docker.com>
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.
This patch moves our own uses of the package to use the new module.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
So far, only a subset of NetworkingConfig was validated when calling
ContainerCreate. Other parameters would be validated when the container
was started. And the same goes for EndpointSettings on NetworkConnect.
This commit adds two validation steps:
1. Check if the IP addresses set in endpoint's IPAMConfig are valid,
when ContainerCreate and ConnectToNetwork is called ;
2. Check if the network allows static IP addresses, only on
ConnectToNetwork as we need the libnetwork's Network for that and it
might not exist until NetworkAttachment requests are sent to the
Swarm leader (which happens only when starting the container) ;
Signed-off-by: Albin Kerouanton <albinker@gmail.com>
- Most error-message returned would already include "container" and the
container ID in the error-message (e.g. "container %s is not running"),
so there's no need to add a custom prefix for that.
- os.Stat returns a PathError, which already includes the operation ("stat"),
the path, and the underlying error that occurred.
And while updating, let's also fix the name to be proper camelCase :)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This function didn't need the whole container, only its ID, so let's
use that as argument. This also makes it consistent with getIpcContainer.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
`Daemon.getPidContainer()` was wrapping the error-message with a message
("cannot join PID of a non running container") that did not reflect the
actual reason for the error; `Daemon.GetContainer()` could either return
an invalid parameter (invalid / empty identifier), or a "not found" error
if the specified container-ID could not be found.
In the latter case, we don't want to return a "not found" error through
the API, as this would indicate that the container we're _starting_ was
not found (which is not the case), so we need to convert the error into
an `errdefs.ErrInvalidParameter` (the container-ID specified for the PID
namespace is invalid if the container doesn't exist).
This logic is similar to what we do for IPC namespaces. which received
a similar fix in c3d7a0c603.
This patch updates the error-types, and moves them into the getIpcContainer
and getPidContainer container functions, both of which should return
an "invalid parameter" if the container was not found.
It's worth noting that, while `WithNamespaces()` may return an "invalid
parameter" error, the `start` endpoint itself may _not_ be. as outlined
in commit bf1fb97575, starting a container
that has an invalid configuration should be considered an internal server
error, and is not an invalid _request_. However, for uses other than
container "start", `WithNamespaces()` should return the correct error
to allow code to handle it accordingly.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This was added in 12485d62ee to save some
duplication, but was really over-engineered to save a few lines of code,
at the cost of hiding away what it does and also potentially returning
inconsistent errors (not addressed in this patch). Let's start with
inlining these.
This removes;
- Daemon.checkContainer
- daemon.containerIsRunning
- daemon.containerIsNotRestarting
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Ensure data-race-free access to the daemon configuration without
locking by mutating a deep copy of the config and atomically storing
a pointer to the copy into the daemon-wide configStore value. Any
operations which need to read from the daemon config must capture the
configStore value only once and pass it around to guarantee a consistent
view of the config.
Signed-off-by: Cory Snider <csnider@mirantis.com>
If the file doesn't exist, the process isn't running, so we should be able
to ignore that.
Also remove an intermediate variable.
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
Do not use 0701 perms.
0701 dir perms allows anyone to traverse the docker dir.
It happens to allow any user to execute, as an example, suid binaries
from image rootfs dirs because it allows traversal AND critically
container users need to be able to do execute things.
0701 on lower directories also happens to allow any user to modify
things in, for instance, the overlay upper dir which neccessarily
has 0755 permissions.
This changes to use 0710 which allows users in the group to traverse.
In userns mode the UID owner is (real) root and the GID is the remapped
root's GID.
This prevents anyone but the remapped root to traverse our directories
(which is required for userns with runc).
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit ef7237442147441a7cadcda0600be1186d81ac73)
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit 93ac040bf0)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
After moving libnetwork to this repo, we need to update all the import
paths for libnetwork to point to docker/docker/libnetwork instead of
docker/libnetwork.
This change implements that.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Various dirs in /var/lib/docker contain data that needs to be mounted
into a container. For this reason, these dirs are set to be owned by the
remapped root user, otherwise there can be permissions issues.
However, this uneccessarily exposes these dirs to an unprivileged user
on the host.
Instead, set the ownership of these dirs to the real root (or rather the
UID/GID of dockerd) with 0701 permissions, which allows the remapped
root to enter the directories but not read/write to them.
The remapped root needs to enter these dirs so the container's rootfs
can be configured... e.g. to mount /etc/resolve.conf.
This prevents an unprivileged user from having read/write access to
these dirs on the host.
The flip side of this is now any user can enter these directories.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
(cherry picked from commit e908cc3901)
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
`os.RemoveAll()` should never return this error. From the docs:
> If the path does not exist, RemoveAll returns nil (no error).
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
This came up in a review of a5324d6950, but
for some reason that comment didn't find its way to GitHub, and/or I
forgot to push the change.
These files are "copied" by reading their content with ioutil.Readfile(),
resolving the symlinks should therefore not be needed, and paths can be
passed as-is;
```go
func copyFile(src, dst string) error {
sBytes, err := ioutil.ReadFile(src)
if err != nil {
return err
}
return ioutil.WriteFile(dst, sBytes, filePerm)
}
```
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>