How to avoid double indexing when using rollover indice

Hello All,

I'm using the rollover feature for my indices on daily basis along with doc_as_upsert,to maintain unique documents only.The rollover Index gets deleted after 30 days.
I can see the issue of double indexing,Everytime in the newly created index the documents is
indexed.Due to this duplications the kibana visulizations shows wrong values.
Though I'm aware that through ILM single index can be made and data could be kept for N days
and then rollover.But the idea is to maintain only single ILM policy for all indices i.e data should be kept for 1 month(with daily rollover) and then deleted.The challenge here is for all indices showing unique values done through logstash upsert shows duplicate due to rollover.

We can't afford duplicate documents on searching. What are the solutions for this issue?

Thanx

Why do you have duplicate documents coming in?

Hello @Mark_Walkom ,

Thanx for your quick response!

I'm fetching the data related to some of our applications through perl scripts,these scripts run
for different time intervals ex:every 5,10,15...mins.So for single day there is no duplication,but
next day when index gets rollover the same data is fetched through scripts and hence the duplication.
I'm trying to find solution where in previous day index data is not searched for use cases using
document_as_upsert.

Thanx

Sounds like that won't work because the duplicate is being put into a new index, so there's nothing to update.

Is there a reason you are not using Elastic Agent/Beats?

Yes I'm using beats agents metricbeat for server metrics,filebeat for someusecase.
But most of the cases perl generated data is maintained through logstash pipeline due to its rich plugin features.
I didn't understood exactly your intention to ask in this case about using elastic agents and would it resolve this issue in anyways?

I think that Filebeat should handle the duplicates a little more effectively.

I guess filebeat would only be helpful in case of structured log line(log ingest pipeline),here with in scripts the data is at times fetched through database and sometimes data is fetched throug scripts.
Due to this logstash is used.
How could filebeat could help for duplicates? , Can you please explain a bit.

Because it keeps track of what has been read from a file, so it will not usually send the same entry more than once.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.