Commit Graph

66 Commits

Author SHA1 Message Date
Sebastiaan van Stijn
c5991341eb remove support for deprecated kernel memory limit
kernel-memory limits are not supported in cgroups v2, and were obsoleted in
[kernel v5.4], producing a `ENOTSUP` in kernel v5.16. Support for this option
was removed in runc and other runtimes, as various LTS kernels contained a
broken implementation, resulting in unpredictable behavior.

We deprecated this option in [moby@b8ca7de], producing a warning when used,
and actively ignore the option since [moby@0798f5f].

Given that setting this option had no effect in most situations, we should
just remove this option instead of continuing to handle it with the expectation
that a runtime may still support it.

Note that we still support RHEL 8 (kernel 4.18) and RHEL 9 (kernel 5.14). We
no longer build packages for Ubuntu 20.04 (kernel 5.4) and Debian Bullseye 11
(kernel 5.10), which still have an LTS / ESM programme, but for those it would
only impact situations where a runtime is used that still supports it, and
an old API version was used.

[kernel v5.4]: https://github.com/torvalds/linux/commit/0158115f702b0ba208ab0
[moby@b8ca7de]: b8ca7de823
[moby@0798f5f]: 0798f5f5cf

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-09-16 13:08:36 +02:00
Sebastiaan van Stijn
18a1b61b49 pkg/sysinfo: remove // import comments
These comments were added to enforce using the correct import path for
our packages ("github.com/docker/docker", not "github.com/moby/moby").
However, when working in go module mode (not GOPATH / vendor), they have
no effect, so their impact is limited.

Remove these imports in preparation of migrating our code to become an
actual go module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-05-30 15:59:18 +02:00
Derek McGowan
0aa8fe0bf9 Update to containerd v2.0.2, buildkit v0.19.0-rc2
Update buildkit version to commit which uses 2.0

Signed-off-by: Derek McGowan <derek@mcg.dev>
2025-01-15 14:09:30 +01:00
Sebastiaan van Stijn
1359772433 pkg/sysinfo: parse cpuset.cpus/mems once and memoize
Preserve the result instead of parsing these for each container that
specifies cpuset options,

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-01-06 17:53:25 +01:00
Sebastiaan van Stijn
2282279180 pkg/sysinfo: internalize parsing cpusets
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-01-06 10:46:42 +01:00
Sebastiaan van Stijn
4597396cb5 pkg/sysinfo: define const for default Max CPUs
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-01-06 10:46:29 +01:00
Sebastiaan van Stijn
799501d172 pkg/sysinfo: rename vars/arguments for clarity
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-01-06 10:40:09 +01:00
Sebastiaan van Stijn
0d51680f91 pkg/sysinfo: stub out parsing cpusets on non-linux
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-01-06 10:40:08 +01:00
Sebastiaan van Stijn
547151abd2 pkg/sysinfo: cleanup tests
- use t.TempDir()
- combine various tests to check if New() sets expected values instead
  of skipping tests when not.
- remove gotest.tools, as it was only used minimally
- replace uses of "path" for filepath operations.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-01-02 16:51:42 +01:00
Sebastiaan van Stijn
8991c4e382 Deprecate BridgeNfIptables and BridgeNfIp6tables fields
The netfilter module is now loaded on-demand, and no longer during daemon
startup, making these fields obsolete. These fields are now always `false`
and will be removed in the next relase.

This patch deprecates:

- the `BridgeNfIptables` field in `api/types/system.Info`
- the `BridgeNfIp6tables` field in `api/types/system.Info`
- the `BridgeNFCallIPTablesDisabled` field in `pkg/sysinfo.SysInfo`
- the `BridgeNFCallIP6TablesDisabled` field in `pkg/sysinfo.SysInfo`

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-12-16 22:10:05 +01:00
Sebastiaan van Stijn
cff4f20c44 migrate to github.com/containerd/log v0.1.0
The github.com/containerd/containerd/log package was moved to a separate
module, which will also be used by upcoming (patch) releases of containerd.

This patch moves our own uses of the package to use the new module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-10-11 17:52:23 +02:00
Brian Goff
74da6a6363 Switch all logging to use containerd log pkg
This unifies our logging and allows us to propagate logging and trace
contexts together.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2023-06-24 00:23:44 +00:00
Akihiro Suda
e807ae4f2e vendor: github.com/containerd/cgroups/v3 v3.0.1
Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2023-03-08 20:15:17 +09:00
Sebastiaan van Stijn
5d10c6ec67 Update handling of deprecated kernel (tcp) memory options
- Omit `KernelMemory` and `KernelMemoryTCP` fields in `/info` response if they're
  not supported, or when using API v1.42 or up.
- Re-enable detection of `KernelMemory` (as it's still needed for older API versions)
- Remove warning about kernel memory TCP in daemon logs (a warning is still returned
  by the `/info` endpoint, but we can consider removing that).
- Prevent incorrect "Minimum kernel memory limit allowed" error if the value was
  reset because it's not supported by the host.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-03-17 09:56:39 +01:00
aiordache
af6307fbda Remove KernelMemory option from /containers/create and /update endpoints
- remove KernelMemory option from `v1.42` api docs
 - remove KernelMemory warning on `/info`
 - update changes for `v1.42`
 - remove `KernelMemory` field from endpoints docs

Signed-off-by: aiordache <anca.iordache@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-03-17 09:55:36 +01:00
Cory Snider
b0b71dbe1c pkg/sysinfo: remove libcontainer dependency
Reimplement GetCgroupMounts using the github.com/containerd/cgroups and
github.com/moby/sys/mountinfo packages.

Signed-off-by: Cory Snider <csnider@mirantis.com>
2022-03-07 18:09:09 -05:00
Akihiro Suda
fecf45b09a Merge pull request #42796 from thaJeztah/containerd_seccomp_check
pkg/sysinfo: use containerd/pkg/seccomp.IsEnabled()
2021-08-29 03:05:59 +09:00
Sebastiaan van Stijn
accec292c1 pkg/sysinfo: use containerd/pkg/seccomp.IsEnabled()
This replaces the local SeccompSupported() utility for the implementation in containerd,
which performs the same check.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-08-27 15:21:52 +02:00
Eng Zer Jun
c55a4ac779 refactor: move from io/ioutil to io and os package
The io/ioutil package has been deprecated in Go 1.16. This commit
replaces the existing io/ioutil functions with their new definitions in
io and os packages.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2021-08-27 14:56:57 +08:00
Sebastiaan van Stijn
9b795c3e50 pkg/sysinfo.New(), daemon.RawSysInfo(): remove "quiet" argument
The "quiet" argument was only used in a single place (at daemon startup), and
every other use had to pass "false" to prevent this function from logging
warnings.

Now that SysInfo contains the warnings that occurred when collecting the
system information, we can make leave it up to the caller to use those
warnings (and log them if wanted).

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-14 23:10:07 +02:00
Sebastiaan van Stijn
1fb62f455c pkg/sysinfo: collect warnings in SysInfo struct
This allows the warnings to be consumed in other locations.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-14 17:28:25 +02:00
Sebastiaan van Stijn
208d3c6efb pkg/sysinfo: move cg2Controllers to be a field in SysInfo and unify v1/v2
We pass the SysInfo struct to all functions. Adding cg2Controllers as a
(non-exported) field makes passing around this information easier.

Now that infoCollector and infoCollectorV2 have the same signature, we can
simplify some bits and use a single slice for all "collectors".

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-14 16:39:44 +02:00
Sebastiaan van Stijn
5cc20ad9e5 pkg/sysinfo: adjust Opt to set new field
This removes the need to have the opts type.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-14 16:39:26 +02:00
Sebastiaan van Stijn
6677ab6a63 pkg/sysinfo: move cgMounts to be a field in SysInfo
We pass the SysInfo struct to all functions. Adding cgMounts as a
(non-exported) field makes passing around this information easier.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-14 16:37:00 +02:00
Sebastiaan van Stijn
10ce0d84c2 pkg/sysinfo.New() move v1 code to a newV1() function
This makes it clearer that this code is the cgroups v1 equivalent of newV2().

Also moves the "options" handling to newV2() because it's currently only used
for cgroupsv2.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-14 16:36:56 +02:00
Sebastiaan van Stijn
6458f750e1 use containerd/cgroups to detect cgroups v2
libcontainer does not guarantee a stable API, and is not intended
for external consumers.

this patch replaces some uses of libcontainer/cgroups with
containerd/cgroups.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2020-11-09 15:00:32 +01:00
Brian Goff
df7031b669 Memoize seccomp value for SysInfo
As it turns out, we call this function every time someone calls `docker
info`, every time a contianer is created, and every time a container is
started.
Certainly this should be refactored as a whole, but for now, memoize the
seccomp value.

Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2020-09-11 22:48:46 +00:00
Kir Kolyshkin
afbeaf6f29 pkg/sysinfo: rm duplicates
The CPU CFS cgroup-aware scheduler is one single kernel feature, not
two, so it does not make sense to have two separate booleans
(CPUCfsQuota and CPUCfsPeriod). Merge these into CPUCfs.

Same for CPU realtime.

For compatibility reasons, /info stays the same for now.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-26 16:19:52 -07:00
Kir Kolyshkin
d5da7e5330 pkg/sysinfo/sysinfo_linux.go: fix some comments
Some were misleading or vague, some were plain wrong.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-22 13:13:27 -07:00
Kir Kolyshkin
f02a53d6b9 pkg/sysinfo.applyPIDSCgroupInfo: optimize
For some reason, commit 69cf03700f chose not to use information
already fetched, and called cgroups.FindCgroupMountpoint() instead.
This is not a cheap call, as it has to parse the whole nine yards
of /proc/self/mountinfo, and the info which it tries to get (whether
the pids controller is present) is already available from cgMounts map.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-05-22 13:13:27 -07:00
Akihiro Suda
f350b53241 cgroup2: implement docker info
ref: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-04-17 07:20:01 +09:00
Akihiro Suda
409bbdc321 cgroup2: enable resource limitation
enable resource limitation by disabling cgroup v1 warnings

resource limitation still doesn't work with rootless mode (even with systemd mode)

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-01-01 02:58:40 +09:00
Rob Gulewich
256eb04d69 Start containers in their own cgroup namespaces
This is enabled for all containers that are not run with --privileged,
if the kernel supports it.

Fixes #38332

Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
2019-05-07 10:22:16 -07:00
Sebastiaan van Stijn
53460047e4 Refactor pkg/sysinfo
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2019-02-04 00:38:12 +01:00
Akihiro Suda
ec87479b7e allow running dockerd in an unprivileged user namespace (rootless mode)
Please refer to `docs/rootless.md`.

TLDR:
 * Make sure `/etc/subuid` and `/etc/subgid` contain the entry for you
 * `dockerd-rootless.sh --experimental`
 * `docker -H unix://$XDG_RUNTIME_DIR/docker.sock run ...`

Signed-off-by: Akihiro Suda <suda.akihiro@lab.ntt.co.jp>
2019-02-04 00:24:27 +09:00
Andrew Hsu
78045a5419 use empty string as cgroup path to grab first find
Signed-off-by: Andrew Hsu <andrewhsu@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-12-07 18:44:00 +01:00
Yong Tang
f023816608 Add memory.kernelTCP support for linux
This fix tries to address the issue raised in 37038 where
there were no memory.kernelTCP support for linux.

This fix add MemoryKernelTCP to HostConfig, and pass
the config to runtime-spec.

Additional test case has been added.

This fix fixes 37038.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-11-26 21:03:08 +00:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Derek McGowan
1009e6a40b Update logrus to v1.0.1
Fixes case sensitivity issue

Signed-off-by: Derek McGowan <derek@mcgstyle.net>
2017-07-31 13:16:46 -07:00
Tobias Klauser
6c9d715a8c sysinfo: use Prctl() from x/sys/unix
Use unix.Prctl() instead of manually reimplementing it using
unix.RawSyscall. Also use unix.SECCOMP_MODE_FILTER instead of locally
defining it.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2017-07-17 10:37:42 +02:00
Christopher Jones
069fdc8a08 [project] change syscall to /x/sys/unix|windows
Changes most references of syscall to golang.org/x/sys/
Ones aren't changes include, Errno, Signal and SysProcAttr
as they haven't been implemented in /x/sys/.

Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>

[s390x] switch utsname from unsigned to signed

per 33267e036f
char in s390x in the /x/sys/unix package is now signed, so
change the buildtags

Signed-off-by: Christopher Jones <tophj@linux.vnet.ibm.com>
2017-07-11 08:00:32 -04:00
Doug Davis
ff42a2eb41 Only show global warnings once
Upon each container create I'm seeing these warning **every** time in the
daemon output:
```
WARN[0002] Your kernel does not support swap memory limit
WARN[0002] Your kernel does not support cgroup rt period
WARN[0002] Your kernel does not support cgroup rt runtime
```
Showing them for each container.create() fills up the logs and encourages
people to ignore the output being generated - which means its less likely
they'll see real issues when they happen.  In short, I don't think we
need to show these warnings more than once, so let's only show these
warnings at daemon start-up time.

Signed-off-by: Doug Davis <dug@us.ibm.com>
2016-11-30 10:11:42 -08:00
Erik St. Martin
56f77d5ade Implementing support for --cpu-rt-period and --cpu-rt-runtime so that
containers may specify these cgroup values at runtime. This will allow
processes to change their priority to real-time within the container
when CONFIG_RT_GROUP_SCHED is enabled in the kernel. See #22380.

Also added sanity checks for the new --cpu-rt-runtime and --cpu-rt-period
flags to ensure that that the kernel supports these features and that
runtime is not greater than period.

Daemon will support a --cpu-rt-runtime flag to initialize the parent
cgroup on startup, this prevents the administrator from alotting runtime
to docker after each restart.

There are additional checks that could be added but maybe too far? Check
parent cgroups to ensure values are <= parent, inspecting rtprio ulimit
and issuing a warning.

Signed-off-by: Erik St. Martin <alakriti@gmail.com>
2016-10-26 11:33:06 -04:00
Kenfe-Mickael Laventure
7e12c3bb99 Update containerd and runc
containerd: 837e8c5e1cad013ed57f5c2090c8591c10cbbdae
runc: 02f8fa7863dd3f82909a73e2061897828460d52f

Signed-off-by: Kenfe-Mickael Laventure <mickael.laventure@gmail.com>
2016-10-05 14:47:15 -07:00
Antonio Murdaca
44ccbb317c *: fix logrus.Warn[f]
Signed-off-by: Antonio Murdaca <runcom@redhat.com>
2016-06-11 19:42:38 +02:00
Jessica Frazelle
69cf03700f pids limit support
update bash commpletion for pids limit

update check config for kernel

add docs for pids limit

add pids stats

add stats to docker client

Signed-off-by: Jessica Frazelle <acidburn@docker.com>
2016-03-08 07:55:01 -08:00
Christy Perez
5b3fc7aab2 Match case for variables in sysinfo pkg
I noticied an inconsistency when reviewing docker/pull/20692.

Changing Ip to IP and Nf to NF.

More info: The golang folks recommend that you keep the initials consistent:
https://github.com/golang/go/wiki/CodeReviewComments#initialisms.

Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2016-03-01 10:37:05 -06:00
Alexander Morozov
781a33b6e7 Reuse subsystems mountpoints between checks
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
2016-01-20 19:20:59 -08:00
Jessica Frazelle
40d5ced9d0 check seccomp is configured in the kernel
Signed-off-by: Jessica Frazelle <acidburn@docker.com>
2016-01-12 09:45:21 -08:00
Ma Shimiao
843084b08b Add support for blkio read/write iops device
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2015-12-21 09:14:49 +08:00