Reindexing is slow process

Aniket_Pant · August 18, 2021, 5:51am

Running ELK stack 7.13.2
Reindexing is a slow process it takes one day or more than one day.
we have 10 indices of 101.4gb reindexing this size of data is costlier even though we close the logstash but nothing happened and if we reindex one index of 100gb then we have another index of same size we can't do because we think elasticsearch can crash
If we have 20 indices then the day it will take 20 days to reindex because we reindex one by one
one more point that in logstash we are using different pipeline id so if we are reindexing one pipeline id's index then we close that and other pipeline_id is running

So is there any approach that we can minimize the time
we are using multi-node concept 3 master nodes 6 data nodes 2 coordinate node

Wolfram_Haussig · August 19, 2021, 7:57am

Hi,

How do you do the reindexing? Are you indexing from remote or from local?

I d not know if you have already tried this but I can recommend:

set refresh_interval of target index to -1 to completely disable refreshs during reindex. Attentation: you will not see any updates until a manual refresh
set number_of_replicas of target index to 0 during reindex
try to increase size property in reindex body(default: 1000). Larger batches can greatly improve speed. Attention: when reindexing from remote size*doc_size must be smaller than 100MB

Reindexing from a remote server uses an on-heap buffer that defaults to a maximum size of 100mb. If the remote index includes very large documents you’ll need to use a smaller batch size.

Best regards
Wolfram

Aniket_Pant · August 19, 2021, 9:10am

sorry @Wolfram_Haussig i don't know what is remote

Aniket_Pant · August 19, 2021, 9:52am

it is local... i will try this

Aniket_Pant · August 21, 2021, 7:30am

it takes 4 hours to reindex 20gb of data and one more question can we do reindexing of another index parallelly.

stephenb · August 21, 2021, 2:35pm

Just driving by....

In addition to The very good suggestions from @Wolfram_Haussig....

You can also try using Logstash to read from the index you want and write to the index you want It's possible you may get better performance.

I guess I am also a little curious about what you are trying to accomplish with all the reindexing?

Aniket_Pant · September 14, 2021, 6:33am

what we are actually trying to do is that we have 20 different indices of 100gb.The method we follow for reindexing is let suppose we are using packetbeat and winlogbeat and other data source also
so if we are using packetbeat in which we have 10 indices , we stop pipeline for packetbeat(winlogbeat and other data source are running),then we do reindex of packetbeat indices one by one.if it complete then we start the packetbeat pipeline.We gonna use this

we are finding the effective way of reindexing in which time consuming should be less.

system · October 12, 2021, 6:34am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reindex API performance Elasticsearch	3	4494	July 5, 2017
Reindex API - Need to improve performance Elasticsearch reindex	3	861	November 1, 2021
Reindex from remote very slow Elasticsearch	1	417	August 10, 2021
Improve reindex speed into new cluster Elasticsearch	4	1090	January 5, 2019
Why is reindex from remote constantly slowing down on large indices? Elasticsearch reindex	2	624	December 31, 2020

Reindexing is slow process

Related topics