How to handle only one same document in multiple indices (indices are based on everyday)

milankamboya · September 23, 2019, 12:58pm

In my project,

I am fetching records from database and keeping document id as primary key id.
I have created indices based on date. So everyday new index has been created.
like, test-2019.09.23, test-2019.09.22
This logstash job runs every 5 minutes. This is requirement.
Now if records has been updated within a day then elasticsearch maintains it properly but if records has been updated on another day then it will be treated as a new document in new index.

This is creating issue as "multiple duplicate records" while searching and fetching the records.

I want to handle one copy(or latest) of same document across indices.

How to handle it?
Any configuration suggestions? or any sample example would be helpful.

jay224 · September 23, 2019, 2:20pm

Unfortunatly, there is no simple out of box approch to handle duplicates across multiple indices.

The solution will have to be seperate de-dupe process.

One solution can be if you have last_update date field that gets updated when records are updated, than you can use that information to hadle updates seperatly from new records. You can search all current & old indices using alias to check _id and do upsert on that index. Comes with some performance hit as you will searching before inserting.

Or you can continue creating duplicates and run de-dupe aggrigation at regular interval, that will identify duplicates and delete older record (original) and leave newer records (updated). With this stratergy your indices will have duplicate for some time when updated record is created but de-dupe has not ran.

system · October 21, 2019, 2:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Update document on multiple indices Logstash	2	974	May 3, 2017
Multiple indices, one alias..single document_id possible? Elasticsearch	2	740	June 23, 2017
Identify and delete duplicates on several indexes Elasticsearch	1	1959	January 9, 2018
Logstash generating duplicated index Logstash	1	478	September 5, 2017
ES design regarding duplicates across indexes Elasticsearch	9	5165	March 1, 2018

How to handle only one same document in multiple indices (indices are based on everyday)

Related topics