Reindexing is slow process

Running ELK stack 7.13.2
Reindexing is a slow process it takes one day or more than one day.
we have 10 indices of 101.4gb reindexing this size of data is costlier even though we close the logstash but nothing happened and if we reindex one index of 100gb then we have another index of same size we can't do because we think elasticsearch can crash
If we have 20 indices then the day it will take 20 days to reindex because we reindex one by one
one more point that in logstash we are using different pipeline id so if we are reindexing one pipeline id's index then we close that and other pipeline_id is running

So is there any approach that we can minimize the time
we are using multi-node concept 3 master nodes 6 data nodes 2 coordinate node

Hi,

How do you do the reindexing? Are you indexing from remote or from local?

I d not know if you have already tried this but I can recommend:

  • set refresh_interval of target index to -1 to completely disable refreshs during reindex. Attentation: you will not see any updates until a manual refresh
  • set number_of_replicas of target index to 0 during reindex
  • try to increase size property in reindex body(default: 1000). Larger batches can greatly improve speed. Attention: when reindexing from remote size*doc_size must be smaller than 100MB
Reindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb. If the remote index includes very large documents you’ll need to use a smaller batch size.

Best regards
Wolfram

sorry @Wolfram_Haussig i don't know what is remote

it is local... i will try this

1 Like

it takes 4 hours to reindex 20gb of data and one more question can we do reindexing of another index parallelly.

Just driving by....

In addition to The very good suggestions from @Wolfram_Haussig....

You can also try using Logstash to read from the index you want and write to the index you want It's possible you may get better performance.

I guess I am also a little curious about what you are trying to accomplish with all the reindexing?

what we are actually trying to do is that we have 20 different indices of 100gb.The method we follow for reindexing is let suppose we are using packetbeat and winlogbeat and other data source also
so if we are using packetbeat in which we have 10 indices , we stop pipeline for packetbeat(winlogbeat and other data source are running),then we do reindex of packetbeat indices one by one.if it complete then we start the packetbeat pipeline.We gonna use this

we are finding the effective way of reindexing in which time consuming should be less.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.