How to avoid double indexing when using rollover indice

PRASHANT_MEHTA · September 14, 2022, 5:34am

Hello All,

I'm using the rollover feature for my indices on daily basis along with doc_as_upsert,to maintain unique documents only.The rollover Index gets deleted after 30 days.
I can see the issue of double indexing,Everytime in the newly created index the documents is
indexed.Due to this duplications the kibana visulizations shows wrong values.
Though I'm aware that through ILM single index can be made and data could be kept for N days
and then rollover.But the idea is to maintain only single ILM policy for all indices i.e data should be kept for 1 month(with daily rollover) and then deleted.The challenge here is for all indices showing unique values done through logstash upsert shows duplicate due to rollover.

We can't afford duplicate documents on searching. What are the solutions for this issue?

Thanx

warkolm · September 14, 2022, 5:35am

Why do you have duplicate documents coming in?

PRASHANT_MEHTA · September 14, 2022, 5:42am

Hello @Mark_Walkom ,

Thanx for your quick response!

I'm fetching the data related to some of our applications through perl scripts,these scripts run
for different time intervals ex:every 5,10,15...mins.So for single day there is no duplication,but
next day when index gets rollover the same data is fetched through scripts and hence the duplication.
I'm trying to find solution where in previous day index data is not searched for use cases using
document_as_upsert.

Thanx

warkolm · September 14, 2022, 5:44am

Sounds like that won't work because the duplicate is being put into a new index, so there's nothing to update.

Is there a reason you are not using Elastic Agent/Beats?

PRASHANT_MEHTA · September 14, 2022, 5:59am

Yes I'm using beats agents metricbeat for server metrics,filebeat for someusecase.
But most of the cases perl generated data is maintained through logstash pipeline due to its rich plugin features.
I didn't understood exactly your intention to ask in this case about using elastic agents and would it resolve this issue in anyways?

warkolm · September 14, 2022, 6:07am

I think that Filebeat should handle the duplicates a little more effectively.

PRASHANT_MEHTA · September 14, 2022, 6:17am

I guess filebeat would only be helpful in case of structured log line(log ingest pipeline),here with in scripts the data is at times fetched through database and sometimes data is fetched throug scripts.
Due to this logstash is used.
How could filebeat could help for duplicates? , Can you please explain a bit.

warkolm · September 14, 2022, 6:18am

Because it keeps track of what has been read from a file, so it will not usually send the same entry more than once.

system · October 12, 2022, 6:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Duplication Due to rollover policy in kibana,data coming from logstash pipeline Logstash	10	737	January 13, 2023
Rollover Index duplication data,data coming from logstash Elasticsearch	19	1466	July 14, 2023
Avoid duplicate document in different Indices,Logsatsh Logstash	2	495	July 28, 2022
Configuring Pipeline To Handle Duplicates In Rollover Indices Logstash	3	1000	September 20, 2019
Duplicates after index rollover Elasticsearch ilm-index-lifecycle-management	17	773	October 26, 2023

How to avoid double indexing when using rollover indice

Related topics