Hello,
We have a relatively old ELK stack that we're attempting to migrate off. It's been neglected and left running to collect some logs from our F5 load balancers. I'm attempting to migrate the daily f5 indices to our new 7.8 ELK stack however the old 6.1.2 ELK is in a bad state and timing out on both reindex API and logstash elasticsearch input. I believe it has too many shards and the JVM garbage collector is causing it to crash whenever I initiate a migration.
How can I get the 6.1.2 ELK stack in a better state so I can initiate the index migration? Looking into reducing shards however I'm not sure if that'll fix it. Here is some information regarding the ES health:
[root@prod-elastic tmp]# curl -XGET '10.21.93.121:9200/_cluster/health?pretty'
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 4221,
"active_shards" : 4221,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 4220,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 50.00592346878332
}
I'm seeing the GC message every few seconds in the elasticsearch.log:
[prod-elastic] [gc][923] overhead, spent [11.7s] collecting in the last [11.9s]
I can start a reindex API request from my 7.8 cluster, however that looks like it stops working ~halfway through. It also causes the 6.1.2 cluster to become unresponsive to any API queries.
Both of the ES nodes are a VM running on the same vcenter. Here's the stats for the 6.1.2 VM:
CPU: 128
Memory: 128
JVMHeap: 12gb
Any suggestions on what I can do to migrate the indices off this cluster? We have around 3 years of historical data we'd like to keep. I can provide more details as requested.
Thanks!