Filebeat set add_id: ~ does not take effect

yt_h · June 25, 2023, 3:08am

I have a filebeat 7.16.2 to extract messages in kafka 3.4. After setting add_id: ~, restarting filebeat will repeatedly send data to elasticsearch.

filebeat.yml:

filebeat.yml: |
  filebeat.inputs:
  - type: kafka
    hosts:
      - 10.43.182.158:9092
    topics: ["yh_standardadapter_api_pro"]
    group_id: "yh_standardadapter_api_pro_filebeat"
    tags: ["yh_standardadapter_api_pro"]
    parsers:
    - ndjson:
      keys_under_root: true
      add_error_key: true
      overwrite_keys: true
    processors:
    - add_id: ~
  processors:
  - drop_fields:
      fields: ["service_id","kafka.partition","input.type","kafka.topic","agent.ephemeral_id","agent.hostname","agent.id","agent.name","agent.type","agent.version","ecs.version","host.name","kafka.key","kafka.offset"]
  output.elasticsearch:
    hosts: 'http://10.43.100.17:9200'
    username: "elastic"
    password: "elastic"
    indices:
      - index: "yh_standardadapter_api_console_pro_%{+yyyy.MM.dd}"
        when.and:
          - equals:
              fields.LoggerType: "Console"
          - contains:
              tags: "yh_standardadapter_api_pro"
      - index: "yh_standardadapter_api_requestlog_pro_%{+yyyy.MM.dd}"
        when.and:
          - equals:
              fields.LoggerType: "RequestLog"
          - contains:
              tags: "yh_standardadapter_api_pro"
      - index: "yh_standardadapter_api_messagetypelog_pro_%{+yyyy.MM.dd}"
        when.and:
          - equals:
              fields.LoggerType: "MessageTypeLog"
          - contains:
              tags: "yh_standardadapter_api_pro"

yt_h · July 14, 2023, 3:09am

Does anyone know why this is

stephenb · July 14, 2023, 11:33pm

What do you mean by repeatedly send data? Send the same messages over and over?

Does it do it if you take that add_id out?

yt_h · July 27, 2023, 7:46am

When I produce a json message like this in kafka

{
    "@timestamp":"2023-07-27T14:43:35.778+08:00",
    "message":"[820535e7-22a1-4797-b5df-f646aab2c1b2] Ack server push request, request = NotifySubscriberRequest, requestId = 179",
    "level":"INFO"
}

filebeat will be correctly consumed in elasticsearch, and when I restart filebeat, it will send the message to elasticsearch again, and there will be an extra record for this, why does add_id: ~ not take effect here

stephenb · July 27, 2023, 1:48pm

Can you show the 2 duplicate docs in elasticsearch please

This sounds more like filebeat re-reading the Kafka queue

Perhaps @leandrojmp might have so insight as I am not a Kafka expert

leandrojmp · July 27, 2023, 2:18pm

Can you share the duplicates message in Kibana as @stephenb asked?

I'm not a Kafka expert, but looking at your config it doesn't seem that filebeat would consume already consumed messages as the group_id doesn't change.

Could be the producer sending the same message twice to kafka?

Also, what you want to achieve with the add_id processor? This processor just adds a random id to the event, but if it receives the same message one or more times, the id of those messages will be different.

yt_h · July 28, 2023, 1:33am

Yes, I can show two duplicate documents in elasticsearch, filebeat double consumes messages from kafka

yt_h · July 28, 2023, 1:38am

This is in my test environment. I use the producer to send only one message to Kafka. There is always one message in Kafka. When filebeat is running, the message will be delivered to elasticsearch once. When I restart filebeat, filebeat will deliver the message once again. Elasticsearch, during the restart of filebeat, there is no new message in kafka. I want to use add_id to generate a unique id for each message, so as to avoid repeated consumption of data to Elasticsearch when filebeat restarts

leandrojmp · July 28, 2023, 1:38am

If possible please edit your filebeat and remove this field "kafka.offset" from the list of fields that you are dropping to check that it is indeed the same message from kafka.

yt_h · July 28, 2023, 1:39am

It is described in the document that add_id processors is to generate a unique id for time, is my understanding wrong?
https://www.elastic.co/guide/en/beats/filebeat/7.16/add-id.html

yt_h · July 28, 2023, 1:41am

In my test environment, it can be confirmed that the message is from kafka

leandrojmp · July 28, 2023, 1:41am

Yes, but share the duplicated messages that you are seeing in kibana and include the kafka offset field.

leandrojmp · July 28, 2023, 2:03am

If I'm not wrong it is unique in the sense that each event will have an unique id, but if filebeat process the same message again for some reason the id will not be the same.

For example, if you have a log file with the following lines:

first message
first message
second message

Each one of those lines are one event, and each one will have a unique id, the first two lines are the same, but the add generated by the add_id processor will be diferent.

yt_h · July 28, 2023, 2:07am

Hi, this is what I just simulated. Except for the filebeat agent.hostname field, the other fields are exactly the same. Is it because the filebeat agent.hostname field is not the same? Is it not a repeated event?

yt_h · July 28, 2023, 2:11am

It means that as long as filebeat is restarted, each document will be recorded repeatedly. Does the add_id processor have no practical effect?

yt_h · July 28, 2023, 2:21am

Oh no, it failed, I fixed agent.hostname as filebeat-0

leandrojmp · July 28, 2023, 2:26am

It just adds a unique ad, it will not help with duplicates, what you want is the fingerprint processor.

This will generate a unique id based on some field of your document, like the message field, so if the message is the same, the id generated would be the same.

But this is not the issue here, the issue is that your kafka input is reading the same offset twice, this should not happen if the group_id is the same, but I'm not sure if the issue is on filebeat or on Kafka.

yt_h · July 28, 2023, 3:19am

Here is my kafka manifest file if you are interested
https://github.com/huangyutongs/hyt

system · August 25, 2023, 5:20am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat: Kafka Input doesn't push the topic offset Beats filebeat	7	733	February 21, 2022
Filebeat add_id processor mechanism Beats filebeat	1	409	October 17, 2021
Configuration issues - Filebeat for shipping messages from a Kafka topic to Elasticsearch Beats docker , filebeat	1	139	April 6, 2023
Target index by input Beats filebeat	1	133	June 13, 2024
EFK \| Filebeat Beats elastic-stack-monitoring , docker , filebeat	2	300	November 29, 2023

Filebeat set add_id: ~ does not take effect

Related topics