Limited to Initial 1000 Logs from API with Elastic Agent

ErGeek · February 7, 2025, 1:12pm

Hi All,

We have configured our elastic-agent.yml to poll logs every 1 minute. However, the API returns a maximum of 1000 logs per poll, so we’re only receiving the first 1000 logs while the remaining logs are not being published.

Could someone please help update the elastic-agent.yml to implement a tailing mechanism using a scan frequency similar to Filebeat, or suggest another method to continuously tail the logs?

Below is our current elastic-agent.yml configuration for reference.

agent:
  logging:
    files:
      keepfiles: 7
      name: elastic-agent
      path: /var/log/elastic-agent/
      permissions: 420
    level: info
    to_files: true

inputs:
  - id: generic-httpjson-sb2-am
    type: tail
    streams:
      - config_version: 2
        data_stream:
          dataset: httpjson.generic
          type: logs
        id: httpjson-httpjson.sandbox2_am
        interval: 30s
        publisher_pipeline.disable_host: true
        request.method: GET
        request.ssl:
          verification_mode: none
        request.transforms:
          - set:
              target: header.x-api-key
              value: ------
          - set:
              target: header.x-api-secret
              value: ------
        request.url: -----=am-everything
        tags:
          - sandbox2_am
        env: sandbox2_am

  - id: generic-httpjson-sb2-idm
    type: tail
    streams:
      - config_version: 2
        data_stream:
          dataset: httpjson.generic
          type: logs
        id: httpjson-httpjson.sandbox2_idm
        interval: 30s
        publisher_pipeline.disable_host: true
        request.method: GET
        request.ssl:
          verification_mode: none
        request.tracer:
          filename: /var/log/elastic-agent/http-request-trace-idm-*.ndjson
          maxbackups: 5
        request.transforms:
          - set:
              target: header.x-api-key
              value: -----
          - set:
              target: header.x-api-secret
              value: -------
        request.url: ------
        tags:
          - sandbox2_idm
        env: sandbox2_idm

outputs:
  default:
    hosts:
      - b-1.amazonaws.com:9094
      - b-2.amazonaws.com:9094
      - b-3.amazonaws.com:9094
    producer:
      compression: gzip
    ssl:
      enabled: true
      truststore_location: /----/kafka.client.truststore.jks
      truststore_password: changeit
    topic: test_app_topic
    type: kafka

Thanks!

leandrojmp · February 7, 2025, 9:36pm

This depends on your API.

If it only returns 1000 logs per polling, it probably has some way to paginate the requests, so you need to paginate to get the other logs.

How you do that depends on how you need to paginate on the API, but the httpjson supports it, you would need to use the response.pagination part on the configuration to correctly paginate on the requests as the example in the documentation.

If you want more examples, you can check how Elastic Agent integrations that uses the httpjson input are doing this on different kinds of APIs here.

ErGeek · February 10, 2025, 1:33pm

Hi @leandrojmp ,

Thank you for providing the links. They were helpful as we are attempting to retrieve logs from the ForgeRock API using the configuration found here( integrations/packages/forgerock/data_stream/am_core/agent/stream/httpjson.yml.hbs at 38acb8874a439c584bdb5502eed4191f31efe25d · elastic/integrations . However, after applying this configuration, we are unable to receive any logs in Elastic.

Below is the current Elastic-agent.yml configuration we are using. But after applying this config , we are unable to get the logs through to Elastic at all.

This is the below Elastic-agent.yml that we are using now :

agent:
  logging:
    files:
      keepfiles: 7
      name: elastic-agent
      path: /var/log/elastic-agent/
      permissions: 420
    level: info
    to_files: true

inputs:
  - id: generic-httpjson-sb2-am
    type: httpjson
    streams:
      - config_version: 2
        data_stream:
          dataset: httpjson.generic
          type: logs
        id: httpjson-httpjson.sandbox2_am
        interval: 30s
        publisher_pipeline.disable_host: true
        request.method: GET
        request.url: "https://..forgeblocks.com/monitoring/logs?source=am-everything"
        request.ssl:
          verification_mode: none
        request.rate_limit:
          limit: '[[.last_response.headers.Get "X-Rate-Limit-Limit"]]'
          remaining: '[[.last_response.headers.Get "X-Rate-Limit-Remaining"]]'
          reset: '[[.last_response.headers.Get "X-Rate-Limit-Reset"]]'
        request.transforms:
          - set:
              target: header.x-api-key
              value: "----"
          - set:
              target: header.x-api-secret
              value: "---"
          - set:
              target: url.params.beginTime
              value: '[[.cursor.last_timestamp]]'
              default: '[[ formatDate (now (parseDuration "-30m")) "2006-01-02T15:04:05-07:00" ]]'
          - set:
              target: url.params.endTime
              value: |-
                [[- $last := (parseDate .cursor.last_timestamp "2006-01-02T15:04:05-07:00") -]]
                [[- $recent := (now (parseDuration "-10s")) -]]
                [[- if $last.Before $recent -]]
                  [[ formatDate $recent "2006-01-02T15:04:05-07:00" ]]
                [[- else -]]
                  [[ formatDate $last.Add (parseDuration "1h") "2006-01-02T15:04:05-07:00" ]]
                [[- end -]]
        response.split:
          target: body.result
          ignore_empty_value: true
        response.pagination:
          - set:
              target: url.params.endTime
              value: '[[.last_response.url.params.Get "endTime"]]'
          - set:
              target: url.params.beginTime
              value: '[[.last_response.url.params.Get "beginTime"]]'
          - set:
              target: url.params._pagedResultsCookie
              value: '[[.last_response.body.pagedResultsCookie]]'
              fail_on_template_error: true
        cursor:
          last_timestamp:
            value: '[[.last_response.url.params.Get "endTime"]]'
        tags:
          - sandbox2_am

outputs:
  default:
    hosts:
      - b-1.:9094
      - b-2.:9094
      - b-3.:9094
    producer:
      compression: gzip
    ssl:
      enabled: true
      truststore_location: /../kafka.client.truststore.jks
      truststore_password: "---"
    topic: test_app_topic
    type: kafka

Also, below is pipeline.conf file filter that we are using currently :

 `filter {
  json {
    source => "message"
    target => "parsed_message"
  }

  if [parsed_message][result] {
    split {
      field => "[parsed_message][result]"
    }
    # Prevent overwriting of existing values
    if ![result_timestamp] {
      mutate {
        add_field => { "result_timestamp" => "%{[parsed_message][result][timestamp]}" }
      }
    }
    if ![result_type] {
      mutate {
        add_field => { "result_type" => "%{[parsed_message][result][type]}" }
      }
    }
    if ![result_source] {
      mutate {
        add_field => { "result_source" => "%{[parsed_message][result][source]}" }
      }
    }

    if [parsed_message][result][payload] {
      mutate {
        add_field => { "payload_content" => "%{[parsed_message][result][payload]}" }
      }

      json {
        source => "payload_content"
        target => "fr"
        remove_field => ["payload_content"]
      }

      if [result_source] == "am-core" or [result_source] == "idm-core" {
        fingerprint {
          source => ["[fr][message]", "[fr][timestamp]", "[fr][transactionId]"]
          target => "[@metadata][fingerprint]"
          concatenate_sources => true
          method => "SHA256"
        }
      } else {
        fingerprint {
          source => ["[fr][_id]", "[fr][eventName]"]
          target => "[@metadata][fingerprint]"
          concatenate_sources => true
          method => "SHA256"
        }
      }
    }

    # Handle pagination parameters
    if [parsed_message][pagedResultsCookie] {
      mutate {
        add_field => { "pagination_cursor" => "%{[parsed_message][pagedResultsCookie]}" }
      }
    }

    prune {
      whitelist_names => ["^fr.*$", "^@metadata$", "^fingerprint$", "^@timestamp$", "^tags", "^result_.*$", "^env$", "^parsed_message.*$"]
    }
  }
}

Could you please check if there is any issue with these configs and suggest me any changes with them?

Thanks in advance!

leandrojmp · February 10, 2025, 1:49pm

If there is an Elastic Agent Integration, why not use it instead of doing the parse on Logstash?

What is not working? The data collection from Elastic Agent or the parse in Logstash?

It is not clear what is not working.

ErGeek · February 10, 2025, 2:50pm

Hi @leandrojmp ,

Currently, we are not using the Elastic Agent integration, as we are following a legacy approach with the standalone agent. However, we do plan to migrate to the Fleet server and eliminate the Logstash layer in the future.

Additionally, data collection is not functioning properly, as we are only receiving limited logs in the Elastic Agent. "input_source":"https://forgeblocks.com/monitoring/logs?source=idm-everything","message":"error processing response: Get \"https://forgeblocks.com/monitoring/logs?_pagedResultsCookie=eyJfc29ydEtleXMiOm51bGwsIl9wYWdlZFJlc3VsdHNDb29raWUiOiJlcGtIQ3BRSEFmUXVjUGhsQ2Z6NVA5SUtzUmctMFIxZUwxdHBTXzBWQm52OE1nSU9NbmdJb2JDOTUzU2lQdHJFeEJodkZUaUlQd1UyYkpSb0lsbzF6NDA0X3BpOFdOQ2JOdHFwR29YRnFBNkwxSE9GSzRyV2cwZWxmcV9GVXA1cVp4cFEtTEoydUlhZy1ScVRfUHI2dkt1aGxXRWVjYlpFSnB0UXdYY0t0MHppYXNmOHVIQy12TUtXdEZNZlp1Q2g0MncxNC1aZVhDYWw1Qjh0VzNaWGNjYzlQbk8tRmdNbnYzaFBBY2Z5S1VEOFh1VTg2S3ZxMlZ5RVE1dml5Y1BFYVIxOEtxbXllb0FCNkVrXzlKQVN6YmRtUFVkdHZxSE5EQkhLY0VWZ0dMcVN0bkpiempkLVNPNzJaNnFHUjZ5b2UxeUR4eVdGSFpaTk45LTRTVUdYQS1BT0xVNFlZX1JqUnRvSEszZGE0R0tZZjhycnljOUZiUEF3dS0tS1V3T1puc21Zbk1Gb3duQkMxMjRrczB3WVd6U1NQelVlOGl3UG92RWNmVUJkTnhXSk11Ni03OGpYb2k5ZUVHUFg2YU1abW5vN2RyTzBtMnptY3JheVFyQlMySHZQclNJdWIwRHNXR1c5VzRyVy01T054RlVZWWpYbkZkQTFDX2VlNkdwWWtIdEpXNHBkVU5mOHdocHd4aWY0Q3FnQkhqUkVPemxBaExjaEV1aW5vYktma3dYRnBZRWtudVBDb0tyZ0FfV01LNHk3MlEwVUJBOHhndXR1MVM4VjdWWS1FbndFaXhVUWlBMFQwLU1MZVVNVE1CUUViR2gtbFZYRjVTWTBYd2h4ZGdSRTlBNlZKenh1ZjI5SXBXdjU4MUotdXBGYTlkNXROSnFsUWtsVnRVT0JBd3JYdmtmejJIbUN6dE93ZFdEZ1JNdjdnTGpqSTlYRFMyUTBxQnFEdmRqN3FMRXNiUmR1eDlGYUxIX21pRlJpREZxWlNUVGJyeGcxMFg1aVE5NXpmVm53aEtXM3R5LXpPLURWYUZ3eGM0U25Cb0tiSUtxRnBobzNrU2tIbXU3cEdOem1rN0Q5M0VkcE04MGVFMXB2UXl3UzE3X2FoT2RMSndWUnRMNEE4bGFkR1l1RE8xSGhsdnlxclAzT2dUZm5yU3ZVend0RmtsMXpCN05XQW9qZXRZYmR2QW03NmpwS1JmeW9GZFU2cUZuXzl6WWlYOEtyWjJkbExzZnJWNnZPSFNzbDZWTGVYNk15ajlzOGZyUWlGRmJFTjAzbHZrR1BkTzBmbFlNTXNQRDlzQkxSSlF3VXM0aDA4cmtydEg2dTRQSG0xcmxoalZ2dWt0QXNac1ZGWnJhN2phOWNmXzg4WnNYaHJkcHQwc0dSNkZNTGJuX09XNnNzNTdUM2c3WEtDaG5YdVFCMFJQaWFJMkJISkRoMWt0Zm5BYkpWVFJaQlhCUHNDSkU2cFF6TlpUbGI2TGhvRXpBRV8xNnZ0eGRJYW5PRS1SRkw3c1pkTElFc1BjWHRMSUM4YUJLMVd6VEcySGdDei1ibEZtckNvMmRRa2pOd3pHTm9HdDdWM2JxVDE1MmRBR2hMTXhseWlqN1VyeUQ3czMtTWhfZjZNUWlPUUJBQiIsIl9wYWdlZFJlc291cmNlc09mZnNldCI6MTAwMCwiX3BhZ2VTaXplIjoxMDAwLCJhcmdzIjp7ImJlZ2luVGltZSI6WyIyMDI1LTAyLTEwVDE0OjEyOjI4KzAwOjAwIl0sImVuZFRpbWUiOlsiMjAyNS0wMi0xMFQxNDo0MjoyOFoiXSwic291cmNlIjpbImlkbS1ldmVyeXRoaW5nIl19fQ&beginTime=2025-02-10T14%3A12%3A28%2B00%3A00&source=idm-everything\": GET https://forgeblocks.com/monitoring/logs?_pagedResultsCookie=eyJfc29ydEtleXMiOm51bGwsIl9wYWdlZFJlc3VsdHNDb29raWUiOiJlcGtIQ3BRSEFmUXVjUGhsQ2Z6NVA5SUtzUmctMFIxZUwxdHBTXzBWQm52OE1nSU9NbmdJb2JDOTUzU2lQdHJFeEJodkZUaUlQd1UyYkpSb0lsbzF6NDA0X3BpOFdOQ2JOdHFwR29YRnFBNkwxSE9GSzRyV2cwZWxmcV9GVXA1cVp4cFEtTEoydUlhZy1ScVRfUHI2dkt1aGxXRWVjYlpFSnB0UXdYY0t0MHppYXNmOHVIQy12TUtXdEZNZlp1Q2g0MncxNC1aZVhDYWw1Qjh0VzNaWGNjYzlQbk8tRmdNbnYzaFBBY2Z5S1VEOFh1VTg2S3ZxMlZ5RVE1dml5Y1BFYVIxOEtxbXllb0FCNkVrXzlKQVN6YmRtUFVkdHZxSE5EQkhLY0VWZ0dMcVN0bkpiempkLVNPNzJaNnFHUjZ5b2UxeUR4eVdGSFpaTk45LTRTVUdYQS1BT0xVNFlZX1JqUnRvSEszZGE0R0tZZjhycnljOUZiUEF3dS0tS1V3T1puc21Zbk1Gb3duQkMxMjRrczB3WVd6U1NQelVlOGl3UG92RWNmVUJkTnhXSk11Ni03OGpYb2k5ZUVHUFg2YU1abW5vN2RyTzBtMnptY3JheVFyQlMySHZQclNJdWIwRHNXR1c5VzRyVy01T054RlVZWWpYbkZkQTFDX2VlNkdwWWtIdEpXNHBkVU5mOHdocHd4aWY0Q3FnQkhqUkVPemxBaExjaEV1aW5vYktma3dYRnBZRWtudVBDb0tyZ0FfV01LNHk3MlEwVUJBOHhndXR1MVM4VjdWWS1FbndFaXhVUWlBMFQwLU1MZVVNVE1CUUViR2gtbFZYRjVTWTBYd2h4ZGdSRTlBNlZKenh1ZjI5SXBXdjU4MUotdXBGYTlkNXROSnFsUWtsVnRVT0JBd3JYdmtmejJIbUN6dE93ZFdEZ1JNdjdnTGpqSTlYRFMyUTBxQnFEdmRqN3FMRXNiUmR1eDlGYUxIX21pRlJpREZxWlNUVGJyeGcxMFg1aVE5NXpmVm53aEtXM3R5LXpPLURWYUZ3eGM0U25Cb0tiSUtxRnBobzNrU2tIbXU3cEdOem1rN0Q5M0VkcE04MGVFMXB2UXl3UzE3X2FoT2RMSndWUnRMNEE4bGFkR1l1RE8xSGhsdnlxclAzT2dUZm5yU3ZVend0RmtsMXpCN05XQW9qZXRZYmR2QW03NmpwS1JmeW9GZFU2cUZuXzl6WWlYOEtyWjJkbExzZnJWNnZPSFNzbDZWTGVYNk15ajlzOGZyUWlGRmJFTjAzbHZrR1BkTzBmbFlNTXNQRDlzQkxSSlF3VXM0aDA4cmtydEg2dTRQSG0xcmxoalZ2dWt0QXNac1ZGWnJhN2phOWNmXzg4WnNYaHJkcHQwc0dSNkZNTGJuX09XNnNzNTdUM2c3WEtDaG5YdVFCMFJQaWFJMkJISkRoMWt0Zm5BYkpWVFJaQlhCUHNDSkU2cFF6TlpUbGI2TGhvRXpBRV8xNnZ0eGRJYW5PRS1SRkw3c1pkTElFc1BjWHRMSUM4YUJLMVd6VEcySGdDei1ibEZtckNvMmRRa2pOd3pHTm9HdDdWM2JxVDE1MmRBR2hMTXhseWlqN1VyeUQ3czMtTWhfZjZNUWlPUUJBQiIsIl9wYWdlZFJlc291cmNlc09mZnNldCI6MTAwMCwiX3BhZ2VTaXplIjoxMDAwLCJhcmdzIjp7ImJlZ2luVGltZSI6WyIyMDI1LTAyLTEwVDE0OjEyOjI4KzAwOjAwIl0sImVuZFRpbWUiOlsiMjAyNS0wMi0xMFQxNDo0MjoyOFoiXSwic291cmNlIjpbImlkbS1ldmVyeXRoaW5nIl19fQ&beginTime=2025-02-10T14%3A12%3A28%2B00%3A00&source=idm-everything giving up after 6 attempt(s)"}

Regards!

Topic		Replies	Views
API logs not coming after setting up Elastic Agent , Beats logs are coming Elastic Agent	14	129	December 18, 2024
Elastic-agent "Process another repeated request" in loop indefinitely Elastic Agent	6	725	April 20, 2023
HTTP Poller [API] input in Logstash - Does it support "Request_data" on top of "Headers"? Logstash	35	2121	November 19, 2022
Logstash http_pollar Rest API push more than 1000 records Logstash	13	422	December 1, 2023
Unable to manage high load using elastic-agent.yml Elastic Agent	3	40	May 7, 2025

Limited to Initial 1000 Logs from API with Elastic Agent

Related topics