How to improve performance of Re-indexing from logstash?

Hi There,

We have 460 GB of index, which is to be re-index , to make all fields not_analyzed.

We started re-indexed around 8 days back, and still in progress.(though there are environment issues on shards unavailable exception).

Surprisingly, I could see the new index size as 330GB and then after few minutes, it will reduce to 310 GB. Like this, its been continuing from last 5 days, without increase in the size limit or document count.

Are we missing any configuration here? Please help here

We have
refresh_interval : -1
Replicas:0
indices.memory.index_buffer_size:25%(25% of 56 GB)

logstash configuration as below:

input {
elasticsearch {
hosts => ["10.158.36.199"]
index => "customevent"
size => 5000
scroll => "20m"
docinfo => true
}
}
output
{
elasticsearch {
action => "index"
hosts => ["10.158.36.199"]
codec => json
index => "it_customevent"
document_type => "dailyaggregate"
document_id => "%{[@metadata][_id]}"
}
}

I don't see anything , anything in the log files? Size may change if your mapping has changed, what is the document count?

FYI, this will never "End" because logstash is supposed to always run, so it will run for the next year if you leave it.

You may want to try just export and re-import the data https://github.com/taskrabbit/elasticsearch-dump if this is a one time mapping change. There are many other applications, Knapsack is another

1 Like

Thanks Ed for the suggesting the tools on re-index.

Actually, we just thought of using re-indexing API by re-indexing each day at a time. And this helped me to finish the re-index task in 1-1.5 days(which is around 600 GB).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.