Elastic-agent overhead/resource consumption

Hi,

i'm using the standalone elastic agent and wondering why there is so much memory usage.
Without limiting the resources. elastic-agent is using 1GB memory for monitoring one file.
(Filebeat is using 100-150 mb for 3-4 files)

within the container there is much more running than it seems to be needed.
We don't want to use anything dynamic or the config reload option.

in the container is running 'apm-server, cloudbeat, packetbeat and osquerybeat'. Why is that needed and is there a way to turn them off?

I know that the documentation is not perfectly for standalone mode but i can't find anything of it.

BR,
lineconnect

Could you provide the standalone configuration that you're running? The Elastic Agent is really just a wrapper for Beats (as of current), so if you really only have 1 integration enabled for monitoring a single file, I would expect mainly a single Filebeat process to be running.

sure. The configuration is:

outputs:
  default:
    type: elasticsearch
    hosts: ['${ELASTICSEARCH_HOST:elasticsearch}:${ELASTICSEARCH_PORT:9200}']
    username: '${ELASTICSEARCH_USERNAME}'
    password: '${ELASTICSEARCH_PASSWORD}'

providers:
  host:
    enabled: false
  agent: 
    enabled: false
  docker:
    enabled: false
  kubernetes:
    enabled: false
  kubernetes_leaderelection:
    enabled: false

agent:
  monitoring:
    enabled: false
    logs: false
    metrics: false
  reload: 
    enabled: false
  logging:
    level: debug

inputs:
  - name: radius-rule-engine
    use_output: default
    json:
      add_error_key: true
      expand_keys: true
      keys_under_root: true
      overwrite_keys: true
    meta:
      package:
        name: log
        version: 1.0.0
    data_stream:
      namespace: engine
      type: radius
    streams:
      - data_stream:
          dataset: ${env.customer}
        ignore_older: 5m
        processors:
          - drop_event:
              when:
                regexp:
                  message: '.*automated-test-authentication-deae972a-6956-4d4d-8166-f1317c45d810.*'
          - add_fields:
              target: ''
              fields:
                customer: ${env.customer}
                usercount: ${env.CUSTOMER_USERCOUNT}
          - add_tags:
              tags: [engine]
              target: ''
        pipeline: remove-meta-log-pipeline
        paths:
          - /var/log/radius/rule-engine-log.json

output of ps aux:

elastic-agent@radius-b7b5bfbcd-2pjx7:~$ ps aux
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
elastic+       1  0.0  0.0   2500   584 ?        Ss   14:46   0:00 /usr/bin/tini -- /usr/local/bin/docker-entrypoint -c /etc/elastic-agent.yml -e
elastic+       7  0.3  0.3 2010728 61972 ?       Sl   14:46   1:15 elastic-agent container -c /etc/elastic-agent.yml -e
elastic+      33  0.0  0.5 1688952 83504 ?       Sl   14:46   0:16 /usr/share/elastic-agent/data/elastic-agent-0ffbed/install/osquerybeat-8.3.3-linux-x86_64/osquerybeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug
elastic+      59  0.0  0.1 1253732 31620 ?       Sl   14:46   0:06 /usr/share/elastic-agent/data/elastic-agent-0ffbed/install/osquerybeat-8.3.3-linux-x86_64/osqueryd --extensions_timeout=10 --flagfile=osquery/osquery.flags --force=true --logger_plugin=osq_logger --events_expir
elastic+      61  0.0  0.0 1372520 8540 ?        Sl   14:46   0:05 /usr/share/elastic-agent/data/elastic-agent-0ffbed/install/osquerybeat-8.3.3-linux-x86_64/osquery-extension.ext --verbose --socket /tmp/295486396/osquery.sock --timeout 10 --interval 3
elastic+     135  0.1  0.7 1780032 129848 ?      Sl   14:47   0:23 /usr/share/elastic-agent/data/elastic-agent-0ffbed/install/filebeat-8.3.3-linux-x86_64/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc
elastic+     267  0.0  0.0   4248  3460 pts/0    Ss+  14:49   0:00 bash
elastic+   23023  0.2  0.5 1551964 94688 ?       Sl   20:11   0:00 /usr/share/elastic-agent/data/elastic-agent-0ffbed/install/packetbeat-8.3.3-linux-x86_64/packetbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -
elastic+   23045  0.0  0.0   4248  3472 pts/1    Ss   20:11   0:00 bash
elastic+   23103  0.0  0.4 1612172 77852 ?       Sl   20:12   0:00 /usr/share/elastic-agent/data/elastic-agent-0ffbed/install/apm-server-8.3.3-linux-x86_64/apm-server -E management.enabled=true -E gc_percent=${APMSERVER_GOGC:100} -E logging.level=debug -E http.enabled=true -E 
elastic+   23115  0.0  0.0   5900  2820 pts/1    R+   20:12   0:00 ps aux

Hmm, by chance are you running multiple agents on the same host?

I wouldn't expect:

osquery/osquerybeat, packetbeat, or apm-server to be running on this agent if it truly is only running the defined input (filebeat).

Could you try running the inspect command and make sure it didn't fall back to using a Fleet policy or something?

there are other elastic-agents running. But as daemonset on each node. Not within in the pod. They cannot see each other currently.

output of elastic-agent inspect -c /etc/elastic-agent.yml:

elastic-agent@radius-b7b5bfbcd-2pjx7:~$ elastic-agent inspect -c /etc/elastic-agent.yml 
agent:
  headers: null
  id: a81522b2-3acb-4183-a864-23b2845579ce
  logging:
    level: debug
  monitoring:
    enabled: false
    http:
      buffer: null
      enabled: false
      host: ""
      port: 6791
    logs: false
    metrics: false
  reload:
    enabled: false
inputs:
- data_stream:
    namespace: engine
    type: radius
  json:
    add_error_key: true
    expand_keys: true
    keys_under_root: true
    overwrite_keys: true
  meta:
    package:
      name: log
      version: 1.0.0
  name: radius-rule-engine
  streams:
  - data_stream:
      dataset: ${env.customer}
    ignore_older: 5m
    paths:
    - /var/log/radius/rule-engine-log.json
    pipeline: remove-meta-log-pipeline
    processors:
    - drop_event:
        when:
          regexp:
            message: .*automated-test-authentication-deae972a-6956-4d4d-8166-f1317c45d810.*
    - add_fields:
        fields:
          customer: ${env.customer}
          usercount: ${env.CUSTOMER_USERCOUNT}
        target: ""
    - add_tags:
        tags:
        - engine
        target: ""
  use_output: default
outputs:
  default:
    hosts:
    - elasticsearch.efk:9200
    password: REMOVED
    type: elasticsearch
    username: logagent-radius-felix
path:
  config: /usr/share/elastic-agent/state
  data: /usr/share/elastic-agent/state/data
  home: /usr/share/elastic-agent/state/data
  logs: /usr/share/elastic-agent/state
providers:
  agent:
    enabled: false
  docker:
    enabled: false
  host:
    enabled: false
  kubernetes:
    enabled: false
  kubernetes_leaderelection:
    enabled: false
runtime:
  arch: amd64
  os: linux
  osinfo:
    family: debian
    major: 20
    minor: 4
    patch: 4
    type: linux
    version: 20.04.4 LTS (Focal Fossa)
elastic-agent@radius-b7b5bfbcd-2pjx7:~$ 

i actually have no clue why elastic-agent is creating the other processes

Someone from the Elastic side might be able to provide some more insight here, but I suspect this "additional" resource overhead comes in 2 forms:

  1. As mentioned the Elastic Agent is really just a wrapper for Beats, so it will probably just have some more overhead to "exist".
  2. Some of this might be Golang garbage collection related. If Go doesn't "need" to clean up unused stuff it could just let it stay there, making it seem like it needs more resources then it really does.

If I had to guess, it's probably a mixture of both of the above.

i would agree in general, but what confused is the containers which are running as daemonset, having multiple inputs and are spawning just the beats they needed and no other processes.

We don't want to completely restrict the mem usage for those(multiple reasons) and a little overhead is ok but moving from 150mb to 1GB is not really arguable.

But thanks for having a look :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.