Commit Graph

65 Commits

Author SHA1 Message Date
Sebastiaan van Stijn
c5991341eb remove support for deprecated kernel memory limit
kernel-memory limits are not supported in cgroups v2, and were obsoleted in
[kernel v5.4], producing a `ENOTSUP` in kernel v5.16. Support for this option
was removed in runc and other runtimes, as various LTS kernels contained a
broken implementation, resulting in unpredictable behavior.

We deprecated this option in [moby@b8ca7de], producing a warning when used,
and actively ignore the option since [moby@0798f5f].

Given that setting this option had no effect in most situations, we should
just remove this option instead of continuing to handle it with the expectation
that a runtime may still support it.

Note that we still support RHEL 8 (kernel 4.18) and RHEL 9 (kernel 5.14). We
no longer build packages for Ubuntu 20.04 (kernel 5.4) and Debian Bullseye 11
(kernel 5.10), which still have an LTS / ESM programme, but for those it would
only impact situations where a runtime is used that still supports it, and
an old API version was used.

[kernel v5.4]: https://github.com/torvalds/linux/commit/0158115f702b0ba208ab0
[moby@b8ca7de]: b8ca7de823
[moby@0798f5f]: 0798f5f5cf

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-09-16 13:08:36 +02:00
Sebastiaan van Stijn
18a1b61b49 pkg/sysinfo: remove // import comments
These comments were added to enforce using the correct import path for
our packages ("github.com/docker/docker", not "github.com/moby/moby").
However, when working in go module mode (not GOPATH / vendor), they have
no effect, so their impact is limited.

Remove these imports in preparation of migrating our code to become an
actual go module.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-05-30 15:59:18 +02:00
Sebastiaan van Stijn
6505d3877c API: /info: remove BridgeNfIptables, BridgeNfIp6tables fields
The `BridgeNfIptables` and `BridgeNfIp6tables` fields in the
`GET /info` response were deprecated in API v1.48, and are now omitted
in API v1.50.

With this patch, old API version continue to return the field:

    curl -s --unix-socket /var/run/docker.sock http://localhost/v1.48/info | jq .BridgeNfIp6tables
    false

    curl -s --unix-socket /var/run/docker.sock http://localhost/v1.48/info | jq .BridgeNfIptables
    false

Omitting the field in API v1.50 and above

    curl -s --unix-socket /var/run/docker.sock http://localhost/v1.50/info | jq .BridgeNfIp6tables
    null

    curl -s --unix-socket /var/run/docker.sock http://localhost/v1.50/info | jq .BridgeNfIptables
    null

This reverts commit eacbbdeec6, and re-applies
a variant of 5d2006256f

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-05-16 19:49:52 +02:00
Sebastiaan van Stijn
eacbbdeec6 Revert "API: /info: remove BridgeNfIptables, BridgeNfIp6tables fields"
This reverts commit 5d2006256f, which
caused some issues in the docker/cli formatting code that needs some
investigating.

Let's (temporarily) revert this while we look what's wrong.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-04-11 14:47:10 +02:00
Sebastiaan van Stijn
5d2006256f API: /info: remove BridgeNfIptables, BridgeNfIp6tables fields
The `BridgeNfIptables` and `BridgeNfIp6tables` fields in the
`GET /info` response were deprecated in API v1.48, and are now omitted
in API v1.49.

With this patch, old API version continue to return the field:

    curl -s --unix-socket /var/run/docker.sock http://localhost/v1.48/info | jq .BridgeNfIp6tables
    false

    curl -s --unix-socket /var/run/docker.sock http://localhost/v1.48/info | jq .BridgeNfIptables
    false

Omitting the field in API v1.49 and above

    curl -s --unix-socket /var/run/docker.sock http://localhost/v1.49/info | jq .BridgeNfIp6tables
    null

    curl -s --unix-socket /var/run/docker.sock http://localhost/v1.49/info | jq .BridgeNfIptables
    null

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-04-10 14:26:42 +02:00
Sebastiaan van Stijn
1359772433 pkg/sysinfo: parse cpuset.cpus/mems once and memoize
Preserve the result instead of parsing these for each container that
specifies cpuset options,

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-01-06 17:53:25 +01:00
Sebastiaan van Stijn
aa696ffbb1 pkg/sysinfo: touch-up docs for cgroupCpusetInfo.Cpus, Mems
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-01-06 10:46:42 +01:00
Sebastiaan van Stijn
799501d172 pkg/sysinfo: rename vars/arguments for clarity
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-01-06 10:40:09 +01:00
Sebastiaan van Stijn
0d51680f91 pkg/sysinfo: stub out parsing cpusets on non-linux
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2025-01-06 10:40:08 +01:00
Sebastiaan van Stijn
8991c4e382 Deprecate BridgeNfIptables and BridgeNfIp6tables fields
The netfilter module is now loaded on-demand, and no longer during daemon
startup, making these fields obsolete. These fields are now always `false`
and will be removed in the next relase.

This patch deprecates:

- the `BridgeNfIptables` field in `api/types/system.Info`
- the `BridgeNfIp6tables` field in `api/types/system.Info`
- the `BridgeNFCallIPTablesDisabled` field in `pkg/sysinfo.SysInfo`
- the `BridgeNFCallIP6TablesDisabled` field in `pkg/sysinfo.SysInfo`

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2024-12-16 22:10:05 +01:00
Sebastiaan van Stijn
6c036f267f pkg/sysinfo: rename max/min as it collides with go1.21 builtin
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2023-08-26 22:02:25 +02:00
Sebastiaan van Stijn
f73d72bfdc pkg: replace some README's with GoDoc package descriptions
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-09-30 17:11:37 +02:00
Sebastiaan van Stijn
5d10c6ec67 Update handling of deprecated kernel (tcp) memory options
- Omit `KernelMemory` and `KernelMemoryTCP` fields in `/info` response if they're
  not supported, or when using API v1.42 or up.
- Re-enable detection of `KernelMemory` (as it's still needed for older API versions)
- Remove warning about kernel memory TCP in daemon logs (a warning is still returned
  by the `/info` endpoint, but we can consider removing that).
- Prevent incorrect "Minimum kernel memory limit allowed" error if the value was
  reset because it's not supported by the host.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2022-03-17 09:56:39 +01:00
Sebastiaan van Stijn
1fb62f455c pkg/sysinfo: collect warnings in SysInfo struct
This allows the warnings to be consumed in other locations.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-14 17:28:25 +02:00
Sebastiaan van Stijn
208d3c6efb pkg/sysinfo: move cg2Controllers to be a field in SysInfo and unify v1/v2
We pass the SysInfo struct to all functions. Adding cg2Controllers as a
(non-exported) field makes passing around this information easier.

Now that infoCollector and infoCollectorV2 have the same signature, we can
simplify some bits and use a single slice for all "collectors".

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-14 16:39:44 +02:00
Sebastiaan van Stijn
5cc20ad9e5 pkg/sysinfo: adjust Opt to set new field
This removes the need to have the opts type.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-14 16:39:26 +02:00
Sebastiaan van Stijn
ca27b473cc pkg/sysinfo: move cg2GroupPath to be a field in SysInfo
We pass the SysInfo struct to all functions. Adding cg2GroupPath as a
(non-exported) field makes passing around this information easier.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-14 16:37:03 +02:00
Sebastiaan van Stijn
6677ab6a63 pkg/sysinfo: move cgMounts to be a field in SysInfo
We pass the SysInfo struct to all functions. Adding cgMounts as a
(non-exported) field makes passing around this information easier.

Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2021-07-14 16:37:00 +02:00
Kir Kolyshkin
afbeaf6f29 pkg/sysinfo: rm duplicates
The CPU CFS cgroup-aware scheduler is one single kernel feature, not
two, so it does not make sense to have two separate booleans
(CPUCfsQuota and CPUCfsPeriod). Merge these into CPUCfs.

Same for CPU realtime.

For compatibility reasons, /info stays the same for now.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
2020-06-26 16:19:52 -07:00
Akihiro Suda
f350b53241 cgroup2: implement docker info
ref: https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
2020-04-17 07:20:01 +09:00
Tobias Klauser
ba8a15694a Use functions from x/sys/unix to get number of CPUs on Linux
Use Getpid and SchedGetaffinity from golang.org/x/sys/unix to get the
number of CPUs in numCPU on Linux.

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
2019-06-18 19:26:56 +02:00
Rob Gulewich
256eb04d69 Start containers in their own cgroup namespaces
This is enabled for all containers that are not run with --privileged,
if the kernel supports it.

Fixes #38332

Signed-off-by: Rob Gulewich <rgulewich@netflix.com>
2019-05-07 10:22:16 -07:00
Yong Tang
f023816608 Add memory.kernelTCP support for linux
This fix tries to address the issue raised in 37038 where
there were no memory.kernelTCP support for linux.

This fix add MemoryKernelTCP to HostConfig, and pass
the config to runtime-spec.

Additional test case has been added.

This fix fixes 37038.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2018-11-26 21:03:08 +00:00
Justin Cormack
f8e876d761 Fix denial of service with large numbers in cpuset-cpus and cpuset-mems
Using a value such as `--cpuset-mems=1-9223372036854775807` would cause
`dockerd` to run out of memory allocating a map of the values in the
validation code. Set limits to the normal limit of the number of CPUs,
and improve the error handling.

Reported by Huawei PSIRT.

Signed-off-by: Justin Cormack <justin.cormack@docker.com>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
2018-10-05 15:09:02 +02:00
Daniel Nephin
4f0d95fa6e Add canonical import comment
Signed-off-by: Daniel Nephin <dnephin@docker.com>
2018-02-05 16:51:57 -05:00
Erik St. Martin
56f77d5ade Implementing support for --cpu-rt-period and --cpu-rt-runtime so that
containers may specify these cgroup values at runtime. This will allow
processes to change their priority to real-time within the container
when CONFIG_RT_GROUP_SCHED is enabled in the kernel. See #22380.

Also added sanity checks for the new --cpu-rt-runtime and --cpu-rt-period
flags to ensure that that the kernel supports these features and that
runtime is not greater than period.

Daemon will support a --cpu-rt-runtime flag to initialize the parent
cgroup on startup, this prevents the administrator from alotting runtime
to docker after each restart.

There are additional checks that could be added but maybe too far? Check
parent cgroups to ensure values are <= parent, inspecting rtprio ulimit
and issuing a warning.

Signed-off-by: Erik St. Martin <alakriti@gmail.com>
2016-10-26 11:33:06 -04:00
Yong Tang
3707a76921 Fix wrong CPU count after CPU hot-plugging on Windows
This fix tries to fix wrong CPU count after CPU hot-plugging.
On windows, GetProcessAffinityMask has been used to probe the
number of CPUs in real time.

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
2016-06-25 22:20:29 -07:00
Jessica Frazelle
69cf03700f pids limit support
update bash commpletion for pids limit

update check config for kernel

add docs for pids limit

add pids stats

add stats to docker client

Signed-off-by: Jessica Frazelle <acidburn@docker.com>
2016-03-08 07:55:01 -08:00
Christy Perez
5b3fc7aab2 Match case for variables in sysinfo pkg
I noticied an inconsistency when reviewing docker/pull/20692.

Changing Ip to IP and Nf to NF.

More info: The golang folks recommend that you keep the initials consistent:
https://github.com/golang/go/wiki/CodeReviewComments#initialisms.

Signed-off-by: Christy Perez <christy@linux.vnet.ibm.com>
2016-03-01 10:37:05 -06:00
Jessica Frazelle
40d5ced9d0 check seccomp is configured in the kernel
Signed-off-by: Jessica Frazelle <acidburn@docker.com>
2016-01-12 09:45:21 -08:00
Ma Shimiao
843084b08b Add support for blkio read/write iops device
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2015-12-21 09:14:49 +08:00
Justas Brazauskas
927b334ebf Fix typos found across repository
Signed-off-by: Justas Brazauskas <brazauskasjustas@gmail.com>
2015-12-13 18:04:12 +02:00
Ma Shimiao
3f15a055e5 Add support for blkio read/write bps device
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2015-12-04 09:26:03 +08:00
Ma Shimiao
0fbfa1449d Add support for blkio.weight_device
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2015-11-11 23:06:36 +08:00
Antonio Murdaca
94464e3a5e Validate --cpuset-cpus, --cpuset-mems
Before this patch libcontainer badly errored out with `invalid
argument` or `numerical result out of range` while trying to write
to cpuset.cpus or cpuset.mems with an invalid value provided.
This patch adds validation to --cpuset-cpus and --cpuset-mems flag along with
validation based on system's available cpus/mems before starting a container.

Signed-off-by: Antonio Murdaca <runcom@linux.com>
2015-09-27 16:38:58 +02:00
qhuang
aa1780997f Add support for memory reservation
Signed-off-by: qhuang <qhuang@10.0.2.15>
2015-09-23 14:02:45 +08:00
Qiang Huang
b6f1b4ad35 Add support for kernel memory limit
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-08-19 23:56:55 +08:00
David Calavera
c2bc637a03 Remove pointers from the SysInfo struct.
Because they can just be values.

Signed-off-by: David Calavera <david.calavera@gmail.com>
2015-08-06 15:57:21 -07:00
Qiang Huang
b7599d58cb Check sysinfo for Cpuset cpu.shares and blkio
Carried: #14015

If kernel is compiled with CONFIG_FAIR_GROUP_SCHED disabled cpu.shares
doesn't exist.
If kernel is compiled with CONFIG_CFQ_GROUP_IOSCHED disabled blkio.weight
doesn't exist.
If kernel is compiled with CONFIG_CPUSETS disabled cpuset won't be
supported.

We need to handle these conditions by checking sysinfo and verifying them.

Signed-off-by: Zefan Li <lizefan@huawei.com>
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-08-05 22:35:18 +08:00
Hu Keping
7390cc5300 Fix golint warning on pkg/sysinfo
Signed-off-by: Hu Keping <hukeping@huawei.com>
2015-08-01 18:24:49 +08:00
Raghavendra K T
921da495d2 Add the memory swappiness tuning option to docker.
Memory swappiness option takes 0-100, and helps to tune swappiness
behavior per container.
For example, When a lower value of swappiness is chosen
the container will see minimum major faults. When no value is
specified for memory-swappiness in docker UI, it is inherited from
parent cgroup. (generally 60 unless it is changed).

Signed-off-by: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
2015-07-12 13:16:33 +05:30
Brian Goff
9b05aa6ee8 cleanup sysinfo package
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
2015-06-17 20:41:14 -04:00
Lei Jitang
57d12a0e0a Add bridge-nf-call-iptables/bridge-nf-call-ipv6tables to docker info
Signed-off-by: Lei Jitang <leijitang@huawei.com>
2015-06-17 09:19:11 +08:00
John Howard
b7ee717a10 Windows: Refactor sysinfo for compilation
Signed-off-by: John Howard <jhoward@microsoft.com>
2015-05-14 09:44:51 -07:00
Qiang Huang
a8250a0b20 Remove redundant warning
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-05-12 18:02:30 +08:00
Ma Shimiao
dccb8b5c33 add cpu.cfs_period_us support
Signed-off-by: Ma Shimiao <mashimiao.fnst@cn.fujitsu.com>
2015-05-09 10:02:46 +08:00
HuKeping
a4a924e1b6 Feature: option for disable OOM killer
Add cgroup support for disable OOM killer.

Signed-off-by: Hu Keping <hukeping@huawei.com>
2015-05-04 21:11:29 +08:00
Brian Goff
fc9033a9c8 Merge pull request #12664 from Mashimiao/sysinfo-support-ipv4_forward-check
sysinfo: add IPv4Forwarding check
2015-04-30 11:44:44 -04:00
Qiang Huang
667b1e220c simplify memory limit check
If memory cgroup is mounted, memory limit is always supported,
no need to check if these files are exist.

Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-04-24 08:43:44 +08:00
Qiang Huang
47e5acfbae add devices cgroup check and errors
Signed-off-by: Qiang Huang <h.huangqiang@huawei.com>
2015-04-24 08:37:59 +08:00