Elastic Cluster Ram, Disk Size 500GB,
3 Master Nodes and 3 Data/Ingest Nodes(1 VCPU, 4GB)
Currently I have Index which is ~200 GB in 1 Shard for 3 month of usage. I would like to delete data for 2 months and leave data only for 1 month based on Timestamp.
What is the best approach for this? Is it just enough to use DeleteByQuery? Should I run this Query for 2 months or it is better to run in portions like day by day?
Will I have issues due to large segment size (~5GB)?
I have Index named like ServiceActivityLogDev. How to use ILM in this case, since date format pattern are not yet applied? How to delete data in existing Index?
If you are deleting the majority of data in an index it is probably better to reindex the data you want to keep than use delete by query. When you reindex you may want to consider switching to time-based indices as this will make it easier to purge old data in the future.
What do you mean under reindexing, approach like creating Index with name like ServiceActivityLogDev-17.03.2020 today, ServiceActivityLogDev-18.03.2020 tomorrow, and so on, and start indexing new Documents to it, so it still possible to search both ServiceActivityLogDev and ServiceActivityLogDev-17.03.2020, ServiceActivityLogDev-18.03.2020, but in a month just delete large ServiceActivityLogDev?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.