Running ELK stack 7.13.2
Reindexing is a slow process it takes one day or more than one day.
we have 10 indices of 101.4gb reindexing this size of data is costlier even though we close the logstash but nothing happened and if we reindex one index of 100gb then we have another index of same size we can't do because we think elasticsearch can crash
If we have 20 indices then the day it will take 20 days to reindex because we reindex one by one
one more point that in logstash we are using different pipeline id so if we are reindexing one pipeline id's index then we close that and other pipeline_id is running
So is there any approach that we can minimize the time
we are using multi-node concept 3 master nodes 6 data nodes 2 coordinate node
How do you do the reindexing? Are you indexing from remote or from local?
I d not know if you have already tried this but I can recommend:
set refresh_interval of target index to -1 to completely disable refreshs during reindex. Attentation: you will not see any updates until a manual refresh
set number_of_replicas of target index to 0 during reindex
try to increase size property in reindex body(default: 1000). Larger batches can greatly improve speed. Attention: when reindexing from remote size*doc_size must be smaller than 100MB
Reindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb. If the remote index includes very large documents you’ll need to use a smaller batch size.
what we are actually trying to do is that we have 20 different indices of 100gb.The method we follow for reindexing is let suppose we are using packetbeat and winlogbeat and other data source also
so if we are using packetbeat in which we have 10 indices , we stop pipeline for packetbeat(winlogbeat and other data source are running),then we do reindex of packetbeat indices one by one.if it complete then we start the packetbeat pipeline.We gonna use this
we are finding the effective way of reindexing in which time consuming should be less.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.