I'm trying to figure out if elasticsearch will remove the segments of the data I delete on it's own or do I need to delete and reindex?
Every week we are indexing millions of documents and every week indexing is becoming more slow. Our process goes something like this. We have a Camel job that pulls down all of our mainframes data for the week. Take about 5 minutes. Then we use that data to insert into elasticsearch. We have 2 clients and 1 master and 6 datanodes 3 indices 24 shards per node. As of now it takes 23 hours to index 1 million documents(750 byte each) where it used to take about 30 minutes.
I'm wondering how can I improve performance? Is elastic garbage collection not optimized? Decreasing shards did not help either.