Filebeat set add_id: ~ does not take effect

I have a filebeat 7.16.2 to extract messages in kafka 3.4. After setting add_id: ~, restarting filebeat will repeatedly send data to elasticsearch.

filebeat.yml:

filebeat.yml: |
  filebeat.inputs:
  - type: kafka
    hosts:
      - 10.43.182.158:9092
    topics: ["yh_standardadapter_api_pro"]
    group_id: "yh_standardadapter_api_pro_filebeat"
    tags: ["yh_standardadapter_api_pro"]
    parsers:
    - ndjson:
      keys_under_root: true
      add_error_key: true
      overwrite_keys: true
    processors:
    - add_id: ~
  processors:
  - drop_fields:
      fields: ["service_id","kafka.partition","input.type","kafka.topic","agent.ephemeral_id","agent.hostname","agent.id","agent.name","agent.type","agent.version","ecs.version","host.name","kafka.key","kafka.offset"]
  output.elasticsearch:
    hosts: 'http://10.43.100.17:9200'
    username: "elastic"
    password: "elastic"
    indices:
      - index: "yh_standardadapter_api_console_pro_%{+yyyy.MM.dd}"
        when.and:
          - equals:
              fields.LoggerType: "Console"
          - contains:
              tags: "yh_standardadapter_api_pro"
      - index: "yh_standardadapter_api_requestlog_pro_%{+yyyy.MM.dd}"
        when.and:
          - equals:
              fields.LoggerType: "RequestLog"
          - contains:
              tags: "yh_standardadapter_api_pro"
      - index: "yh_standardadapter_api_messagetypelog_pro_%{+yyyy.MM.dd}"
        when.and:
          - equals:
              fields.LoggerType: "MessageTypeLog"
          - contains:
              tags: "yh_standardadapter_api_pro"

Does anyone know why this is

What do you mean by repeatedly send data? Send the same messages over and over?

Does it do it if you take that add_id out?

When I produce a json message like this in kafka

{
    "@timestamp":"2023-07-27T14:43:35.778+08:00",
    "message":"[820535e7-22a1-4797-b5df-f646aab2c1b2] Ack server push request, request = NotifySubscriberRequest, requestId = 179",
    "level":"INFO"
}

filebeat will be correctly consumed in elasticsearch, and when I restart filebeat, it will send the message to elasticsearch again, and there will be an extra record for this, why does add_id: ~ not take effect here

Can you show the 2 duplicate docs in elasticsearch please

This sounds more like filebeat re-reading the Kafka queue

Perhaps @leandrojmp might have so insight as I am not a Kafka expert

Can you share the duplicates message in Kibana as @stephenb asked?

I'm not a Kafka expert, but looking at your config it doesn't seem that filebeat would consume already consumed messages as the group_id doesn't change.

Could be the producer sending the same message twice to kafka?

Also, what you want to achieve with the add_id processor? This processor just adds a random id to the event, but if it receives the same message one or more times, the id of those messages will be different.

Yes, I can show two duplicate documents in elasticsearch, filebeat double consumes messages from kafka

This is in my test environment. I use the producer to send only one message to Kafka. There is always one message in Kafka. When filebeat is running, the message will be delivered to elasticsearch once. When I restart filebeat, filebeat will deliver the message once again. Elasticsearch, during the restart of filebeat, there is no new message in kafka. I want to use add_id to generate a unique id for each message, so as to avoid repeated consumption of data to Elasticsearch when filebeat restarts

If possible please edit your filebeat and remove this field "kafka.offset" from the list of fields that you are dropping to check that it is indeed the same message from kafka.

It is described in the document that add_id processors is to generate a unique id for time, is my understanding wrong?
https://www.elastic.co/guide/en/beats/filebeat/7.16/add-id.html

In my test environment, it can be confirmed that the message is from kafka

Yes, but share the duplicated messages that you are seeing in kibana and include the kafka offset field.

If I'm not wrong it is unique in the sense that each event will have an unique id, but if filebeat process the same message again for some reason the id will not be the same.

For example, if you have a log file with the following lines:

first message
first message
second message

Each one of those lines are one event, and each one will have a unique id, the first two lines are the same, but the add generated by the add_id processor will be diferent.

1 Like


Hi, this is what I just simulated. Except for the filebeat agent.hostname field, the other fields are exactly the same. Is it because the filebeat agent.hostname field is not the same? Is it not a repeated event?

It means that as long as filebeat is restarted, each document will be recorded repeatedly. Does the add_id processor have no practical effect?

Oh no, it failed, I fixed agent.hostname as filebeat-0

It just adds a unique ad, it will not help with duplicates, what you want is the fingerprint processor.

This will generate a unique id based on some field of your document, like the message field, so if the message is the same, the id generated would be the same.

But this is not the issue here, the issue is that your kafka input is reading the same offset twice, this should not happen if the group_id is the same, but I'm not sure if the issue is on filebeat or on Kafka.

1 Like

Here is my kafka manifest file if you are interested
https://github.com/huangyutongs/hyt

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.