Unable to manage high load using elastic-agent.yml

Hi All,

We are collecting logs from the ForgeRock API using Elastic Agent, which forwards the logs to Kafka. From there, the logs are processed by Logstash.

We are noticing a delay in log ingestion starting from the Elastic Agent itself, particularly during high load testing scenarios. This delay is visible directly in the Elastic Agent logs. Could anyone please check the below elastic-agent.yml config and suggest on how to regulate this config to manage high loads.

Below is the elastic-agent.yml config -

  logging:
    files:
      keepfiles: 7
      name: elastic-agent
      path: /var/log/elastic-agent/
      permissions: 420
    level: debug
    to_files: true

inputs:
  - id: generic-httpjson-staging
    type: httpjson
    streams:
      - config_version: 2
        data_stream:
          dataset: httpjson.generic
          type: logs
        id: httpjson-httpjson.staging
        interval: 30s
        publisher_pipeline.disable_host: true
        request.method: GET
        request.ssl:
          verification_mode: none
        request.url: "https://forgerock.io/monitoring/logs?source=am-everything,idm-everything"
        request.rate_limit:
          limit: '[[.last_response.header.Get "X-Ratelimit-Limit"]]'
          remaining: '[[.last_response.header.Get "X-Ratelimit-Remaining"]]'
          reset: '[[.last_response.header.Get "X-Ratelimit-Reset"]]'
          early_limit: 5
        request.retry:
          max_attempts: 5
          wait_min: 5s
          wait_max: 30s
        request.tracer:
          filename: /var/log/elastic-agent/http-request-trace-*.ndjson
          maxbackups: 5
        request.transforms:
          - set:
              target: header.x-api-key
              value: ----
          - set:
              target: header.x-api-secret
              value: ----
          - set:
              target: url.params.beginTime
              value: '[[.cursor.last_timestamp]]'
              default: '[[ formatDate (now (parseDuration "-1h")) "2006-01-02T15:04:05-07:00" ]]'
          - set:
              target: url.params.endTime
              value: |-
                [[- $last := (parseDate .cursor.last_timestamp "2006-01-02T15:04:05-07:00") -]]
                [[- $day := (parseDuration "24h") -]]
                [[- $end := 0 -]][[- /* Predeclare $end. */ -]]
                [[- with $last -]]
                  [[- $end = .Add $day -]]
                [[- end -]]
                [[- with $end -]]
                  [[- $recent := (now (parseDuration "-10s")) -]][[- /* Ensure that the API has stabilised the documents' presence. */ -]]
                  [[- if .Before $recent -]]
                    [[- formatDate $end "2006-01-02T15:04:05-07:00" -]]
                  [[- else -]]
                    [[- formatDate $recent "2006-01-02T15:04:05-07:00" -]]
                  [[- end -]]
                [[- end -]]
              default: |-
                [[- $start := (now (parseDuration "-1h")) -]]
                [[- $day := (parseDuration "24h") -]]
                [[- $end := 0 -]][[- /* Predeclare $end. */ -]]
                [[- with $start -]]
                  [[- $end = .Add $day -]]
                [[- end -]]
                [[- with $end -]]
                  [[- $recent := (now (parseDuration "-10s")) -]][[- /* Stabilisation time. */ -]]
                  [[- if .Before $recent -]]
                    [[- formatDate $end "2006-01-02T15:04:05-07:00" -]]
                  [[- else -]]
                    [[- formatDate $recent "2006-01-02T15:04:05-07:00" -]]
                  [[- end -]]
                [[- end -]]
        response.split:
          target: body.result
          ignore_empty_value: true
        response.pagination:
          - set:
              target: url.params.endTime
              value: '[[.last_response.url.params.Get "endTime"]]'
          - set:
              target: url.params.beginTime
              value: '[[.last_response.url.params.Get "beginTime"]]'
          - set:
              target: url.params._pagedResultsCookie
              value: '[[.last_response.body.pagedResultsCookie]]'
              fail_on_template_error: true
        cursor:
          last_timestamp:
            value: '[[.last_response.url.params.Get "endTime"]]'
        tags:
          - staging
        fields:
          environment: "staging"
        processors:
          - fingerprint:
              fields: ["@timestamp", "message"]
              target_field: "fingerprint"
              method: "sha256"
              encoding: "hex"

outputs:
  default:
    hosts:
      - b-1.kafka-----.amazonaws.com:9094
      - b-2.kafka-----.amazonaws.com:9094
      - b-3.kafka-----.amazonaws.com:9094
    producer:
      compression: gzip
    ssl:
      enabled: true
      truststore_location: /etc/pki/tls/certs/kafka.client.truststore.jks
      truststore_password: "----"
    topic: test_app_topic
    type: kafka

Can you share the logs the shows this? It is not clear if the delay is on the agent getting logs from the API, processing it or sending it to the output.

Hi @leandrojmp ,

We are getting the below trace:

"{"log.level":"debug","@timestamp":"2025-05-06T11:50:06.387+0100","message":"HTTP response","transaction.id":"BKS9GQ21SOT1G-97263","http.response.status_code":429,"http.response.body.content":"{"errors":["Rpc Error: Code = ResourceExhausted Desc = Quota Exceeded For Quota Metric 'Read Requests' And Limit 'Read Requests Per Minute Per User' Of Service 'Logging.Googleapis.Com' For Consumer 'Project_number:301743521374'.\nError Details: Name = ErrorInfo Reason = RATE_LIMIT_EXCEEDED Domain = Googleapis.Com Metadata = Map[Consumer:Projects/301743521374 Quota_limit:ReadRequestsPerMinutePerUser Quota_limit_value:60 Quota_location:Global Quota_metric:Logging.Googleapis.Com/Read_requests Quota_unit:1/Min/{Project}/{User} Service:Logging.Googleapis.Com]\nError Details: Name = Help Desc = Request A Higher Quota Limit. Url = Https://Cloud.Google.Com/Docs/Quotas/Help/Request_increase\"]}""

Regards

This is unrelated to the Elastic Agent, you are receiving a 429 error from the endpoint you are querying, you are being rate limited by the endpoint.