Unable to migrate indices from old 6.1.2 elasticsearch node to 7.8 elasticsearch node


We have a relatively old ELK stack that we're attempting to migrate off. It's been neglected and left running to collect some logs from our F5 load balancers. I'm attempting to migrate the daily f5 indices to our new 7.8 ELK stack however the old 6.1.2 ELK is in a bad state and timing out on both reindex API and logstash elasticsearch input. I believe it has too many shards and the JVM garbage collector is causing it to crash whenever I initiate a migration.

How can I get the 6.1.2 ELK stack in a better state so I can initiate the index migration? Looking into reducing shards however I'm not sure if that'll fix it. Here is some information regarding the ES health:

    [root@prod-elastic tmp]#  curl -XGET ''
  "cluster_name" : "elasticsearch",
  "status" : "yellow",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 4221,
  "active_shards" : 4221,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 4220,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 50.00592346878332

I'm seeing the GC message every few seconds in the elasticsearch.log:

    [prod-elastic] [gc][923] overhead, spent [11.7s] collecting in the last [11.9s]

I can start a reindex API request from my 7.8 cluster, however that looks like it stops working ~halfway through. It also causes the 6.1.2 cluster to become unresponsive to any API queries.

Both of the ES nodes are a VM running on the same vcenter. Here's the stats for the 6.1.2 VM:
CPU: 128
Memory: 128
JVMHeap: 12gb

Any suggestions on what I can do to migrate the indices off this cluster? We have around 3 years of historical data we'd like to keep. I can provide more details as requested.

Well, you have a lot of shards and a smallish heap - step one is raising heap to 32GB or so to stop GC and other issues - you have 128GB of RAM - even though it's slower, you can also go much higher if it keeps you from crashing (like 64GB but should not be needed; 32GB is a lot for just metadata).

If that's not stable, close the older indices you don't need, then reindex/migrate the rest and then close those, open others, etc. in a rolling manner to keep index/shard count under control.

Thanks, bumping the JVM heap up temporarily allowed us to migrate the indices off. We were able to consolidate them to reduce shard count.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.