Hello Elastic people,
Today ECE assumes it has the entire host for its use. For cloud and VMs that might be ok but for on-prem and future systems with a lot of resources it might not be.
The future shows that more cores per CPU will be available (see the upcoming AMD Genoa and Bergamo 128 core/CPU) so the hosts(allocators) will have at disposal far more more(double, compared the previous generation) compute resources in a single box. On Elastic ingest nodes(where most of the parsing is happening anyway) it would make sense to have a custom option to influence the calculated cpu quota for ingest instances. There are some requests already for this.
but also on the ECE system containers.
Some time ago(2021), I requested more or less the same memory and cpu contention mechanisms(cgroups via docker) that are already used for the Elastic Stack to be applied for the ECE system containers.(frc-runners…,etc.) which are unaware of what limits are imposed on containers
ECE System containers are not reading the docker cgroup limits that will limit the /proc/cpuinfo /proc/meminfo /proc/swaps
Ideally the runner app inside the container should not even use /proc/meminfo(does not support cgroups) and instead rely to docker cgroup limits /sys/fs/cgroup/memory/memory.limit_in_bytes
With so many compute resources at our disposal(eg. dual Bergamo 2x128 cores and 12TB RAM or more via CXL) in a single system we might have multiple Filebeat and Logstash instances on the same host but would like to avoid them competing for resources with the ECE system containers
Since there is no public github repo for ECE we need to discuss it here.