Enhancements for ECE system containers or memory and CPU contention/limit aware

florinsfetea · November 30, 2022, 3:20am

Hello Elastic people,

Today ECE assumes it has the entire host for its use. For cloud and VMs that might be ok but for on-prem and future systems with a lot of resources it might not be.

The future shows that more cores per CPU will be available (see the upcoming AMD Genoa and Bergamo 128 core/CPU) so the hosts(allocators) will have at disposal far more more(double, compared the previous generation) compute resources in a single box. On Elastic ingest nodes(where most of the parsing is happening anyway) it would make sense to have a custom option to influence the calculated cpu quota for ingest instances. There are some requests already for this.

but also on the ECE system containers.

Some time ago(2021), I requested more or less the same memory and cpu contention mechanisms(cgroups via docker) that are already used for the Elastic Stack to be applied for the ECE system containers.(frc-runners…,etc.) which are unaware of what limits are imposed on containers

ECE System containers are not reading the docker cgroup limits that will limit the /proc/cpuinfo /proc/meminfo /proc/swaps
Ideally the runner app inside the container should not even use /proc/meminfo(does not support cgroups) and instead rely to docker cgroup limits /sys/fs/cgroup/memory/memory.limit_in_bytes

With so many compute resources at our disposal(eg. dual Bergamo 2x128 cores and 12TB RAM or more via CXL) in a single system we might have multiple Filebeat and Logstash instances on the same host but would like to avoid them competing for resources with the ECE system containers

Since there is no public github repo for ECE we need to discuss it here.

florinsfetea · November 30, 2022, 3:24am

@kimchy

would that fit as innovation?

on the same enhancement topic:

"What are the benefits of cgroup v2 that you think would be useful here?"
Please see the following article which nicely tracks all advances.

focuses on simplicity
Friendly to rootless containers - meaning --cpus=2 --cpu-shares=2000 which ECE should use
eBPF-oriented
which takes me not only for device access control but to a better network control

For a more detailed discussion about modern kernel features(cpu.pressure, memory.pressure, and io.pressure) see the thinking behind cgroups v2 here:
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#issues-with-v1-and-rationales-for-v2

system · December 14, 2022, 3:24am

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ECE expanding ECE node (adding RAM and CPU to VM) Elastic Cloud Enterprise (ECE)	2	744	February 3, 2021
Allocator information Elastic Cloud Enterprise (ECE)	5	1108	November 5, 2018
How to configure a deployment use more disk resource? Elastic Cloud Enterprise (ECE)	2	783	May 3, 2019
Need some advice on cpu core number vs memory for ece allocator Elastic Cloud Enterprise (ECE)	4	2679	March 7, 2019
Increased memory not detected by Allocator Elastic Cloud Enterprise (ECE)	2	1528	July 12, 2017

Enhancements for ECE system containers or memory and CPU contention/limit aware

Related topics