HTTPJSON input : share request identifier among events

Hi,

I've been struggling for a long time to configure my filebeat to request RESTful APIs when the response is a JSON array of multiple objects.
Because I want to sum a metric over time (this metric is present in each object) and aggregated by object, I need to identify the batch of documents which come from the same request. And ideally, I would like this identifier to be a timestamp parsable by KB as a date histogram field for aggregation based visualization.
I've been using

  • cursor but I can't manage to store a date value (like [[now]])
  • input state, particularly first_event and last_event (like first_event.body.requested_at custom field) but the default value (which I set to [[now]]) is always picked

Any idea how I could get this value? Or any other strategy to solve my problem?

I hope it is clear enough, I am available to give more details if needed.

Thanks
Thomas

Hello @thopic !

If I understand correctly, you need to add a timestamp to all the events coming from the same response, and it needs to be the same in all of them. As mentioned, you can use cursor for this purpose, could you share your config to see why you might be having issues with it? Alternatively, you could use the response.transforms config:

  response.transforms:
    - set:
        target: body.ts
        value: "[[formatDate (now)]]"

Would be useful if you can share both your httpjson config and a sample response to take a look at.

Thanks!

Hi @marc.guasch

Thank you for answering so quickly.

I did many tries because I haven't correctly understood the way the httpjson module handles json arrays (cf a previous topic of mine).

Below is my httpjson config with 3 different attempts (all failed):

- type: httpjson
  config_version: 2
  interval: 5m
  request.url: https://domain.tld/api/v1/...
# FIRST TRY
#  response.transforms:
#    - set:
#        target: body.requested_at
#        value: '[[.cursor.requested_at]]'
#        default: "[[now]]"
#  cursor:
#    requested_at:
#      value: '[[.first_event.body.requested_at]]'
#      default: "[[now]]"
# SECOND TRY
#  response.transforms:
#    - set:
#        target: body.requested_at
#        value: '[[.first_event.body.requested_at]]'
#        default: "[[now]]"
  processors:
    - decode_json_fields:
        fields: ["message"]
        target: "stats"
    - drop_fields:
        fields: ["message", "stats.rl", "stats.rl_scope", "stats.is_relayed", "stats.pushover_active", "stats.last_pop3_login", "stats.attributes", "stats.relayhost"]
# THIRD TRY
#    - add_id: 
#        target_field: "stats.batch.id"

Both first and second attempts gave a different timestamp to each json object/response/document. And third attempt gave a different uid to each json object/response/document.

A classic response from the server is :

[
  {
    "max_new_quota": 10737418240,
    "username": "info@doman3.tld",
    "rl": false,
    "is_relayed": 0,
    "name": "Full name",
    "active": "1",
    "domain": "doman3.tld",
    "local_part": "info",
    "messages": 152,
    ...
  },
  {
    "max_new_quota": 10737418240,
    "username": "test@doman2.tld",
    "rl": false,
    "is_relayed": 0,
    "name": "Full name",
    "active": "1",
    "domain": "doman2.tld",
    "local_part": "test",
    "messages": 456,
    ...
  },
  ...
]

And this response would create these two documents in ES:

{
  "_index": "stats-filebeat-2022.01",
  "_type": "_doc",
  "_id": "l_sGSn4BtCr2vRGF9Z4J",
  "_score": 1,
  "_source": {
    "@timestamp": "2022-01-11T16:44:36.376Z",
    "host": ...,
    "agent": ...,
    "event": {
      "created": "2022-01-11T16:44:36.376Z"
    },
    "ecs": {
      "version": "1.12.0"
    },
    "input": {
      "type": "httpjson"
    },
    "stats": {
      "active": 1,
      "domain": "doman3.tld",
      "name": "Full name",
      "local_part": "info",
      "max_new_quota": 10737418240,
      "username": "info@doman3.tld",
      "messages": 152
    }
  },
  "fields": ...
},
{
  "_index": "stats-filebeat-2022.01",
  "_type": "_doc",
  "_id": "BytMTX4BtCr2vRGFqsuZ",
  "_score": 1,
  "_source": {
    "@timestamp": "2022-01-11T16:44:36.377Z",
    "host": ...,
    "agent": ...,
    "event": {
      "created": "2022-01-11T16:44:36.377Z"
    },
    "ecs": {
      "version": "1.12.0"
    },
    "input": {
      "type": "httpjson"
    },
    "stats": {
      "active": 1,
      "domain": "doman2.tld",
      "name": "Full name",
      "local_part": "test",
      "max_new_quota": 10737418240,
      "username": "test@doman2.tld",
      "messages": 456
    }
  },
  "fields": ...
}

I did try your config but there is a difference of 3sec between the first and last document, same as my tries.
Is there something I misunderstand?

Thanks again

Thomas

Okay I think I know what is going on here.

For the third case, this is expected as the input processors get executed on each published event.

For the first two, when the httpjson input receives an array response it is automatically split, generating one document for each object found in it. This implicit top level split step can't be skipped, and any defined split in the config will behave as if they were nested under it. Seeing the documentation I couldn't find any reference to this behavior, but I created this issue to amend this. Since there is no top level object common to all documents, response.transforms can't be used straight away as I suggested first, since, as you mentioned, it will be executed separately for each of them after the split is done.

As a workaround, and to avoid doing complicated things with the cursor and templates, you can rely on the Date header, which should be part of the server response:

- type: httpjson
  config_version: 2
  interval: 5m
  request.url: https://domain.tld/api/v1/...
  response.transforms:
    - set:
        target: body.requested_at
        value: '[[.last_response.header.Get "Date"]]'
processors:
  - decode_json_fields:
      fields: ["message"]
      target: "stats"
  - drop_fields:
      fields: ["message", "stats.rl", "stats.rl_scope", "stats.is_relayed", "stats.pushover_active", "stats.last_pop3_login", "stats.attributes", "stats.relayhost"]

You can format it as you wish with the parseDate and formatDate template functions.

Hope that helps and makes sense!

1 Like

Hi Marc

Thanks a lot! You've summarized very accurately the problem I am facing.
I will monitor the issue and use your workaround for now (which seems enough and clever for my needs).

Thomas

1 Like

(I confirm that I managed to get this info, add it to each json object and get my histograms as I expected. Thank you for this workaround!)

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.