More Index Latency with add_id proc in Filebeat

demudrol · August 7, 2024, 8:14am

Hi! We have logging structure like with:
Application -> docker json -> Filebeat file harvester -> Kafka Topic -> Filebeat retranslator from Topic to Elasticsearch.
Sometimes there is a lot of event deduplication in Elasticsearch. According to this article Deduplicate data | Filebeat Reference [7.17] | Elastic add_id processor might help.
Yesterday applied with configuration and saw that Index Latency now higher in 2 or 3 times.
I though that if filebeat make metadata_id, latency must be even less cause less work on Elasticsearch but it don't. Could explain why?

Christian_Dahlqvist · August 7, 2024, 8:21am

When you specify a document ID outside of Elasticsearch, e.g. in Filebeat, every index operation is essentially a potential update and more expensive than if Elasticsearch is allowed to set the document ID (it knows the ID is uniwu and can just insert it). This is the price you pay to avoid/limit duplicates.

Topic		Replies	Views
Filebeat add_id processor mechanism Beats filebeat	1	410	October 17, 2021
Duplication in Filebeat to Elasticsearch data pushing Beats filebeat	5	702	December 28, 2017
Error calling pipeline when setting the document ID in beats Beats filebeat	7	512	June 7, 2021
Filebeat deduplication fail to update index Beats filebeat	8	1066	June 18, 2020
Deduplicate data Beats filebeat	4	514	July 15, 2022

More Index Latency with add_id proc in Filebeat

Related topics