Data come from a Transform, but they are not updated (I have some bucket selector which put on hold the Transform if it's not complete). The duplicated rows are identical for the both (previous and current) indexes.
Elasticsearch version 7.17.3
Filebeat > Elasticsearch ingest node > Transform > ILM
That would probably require you to reindex data no longer being updated before deleting it from the original index. It may be difficult to get this consistent though.
A query as "reindex data between ereyesterday 00:00:00 hour and yesterday 00:00:00 hour" and then "delete those data on the same date range" will not be consistent?
The two operations will take some time to run and during that time you would have duplicates, and there could be failures that need to be handled. This also assumes that you do not have any data coming in late updating any of the documents that have been transferred, and I suspect this depends on the nature and logic of the transform.
If this solution require you to perform a reindex as well as a delete by query for every document, would it not be better to have a single transform index with a larger number of primary shards and delete documents by delete by query once they no longer need to be retained. It would be simpler and also add less load on the cluster.
But if for some reason I need to keep the data for 6 months, or a year?
Or I just want to change the template, because I have a new field or way to group the data?
Storing data in 10 indices with 1 primary shard each or 1 single index with 10 primary shards is basically the same. Having only one index is simpler and require a lot less work and load on the cluster.
I am not sure how you handle changes to transforms, but do not see how switching to a different time based index is any different from starting to use a new single index. You can still query the old and new singular index through an index pattern or alias and have the same issues.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.