Upgrade from ES 2.3.4 to 2.4.0 causes performance problem and than downgrading back faces shard relocation problem


(Szabolcs Szentes) #1

Hi,

After upgrading our ES cluster (3 nodes) from 2.3.4 to 2.4.0 the performance droped.
The upgrade was a rolling upgrade with keeping the old confgiuration and data as well (no reindexing or restoring).

Than we decided to downgrade back to 2.3.4 but after downgrading one node and enabling shard allocation again it joins the cluster but it doesn not get any data (no shard is allocated to that node). Maybe because of Lucene version differences (2.3.4 has Lucene 5.5.0 and 2.4.0 has 5.5.2), just a guess.

Now we left with a cluster of 2 nodes (2.4.0) with reduced performance.
We think about two options:

  1. Try to reindex the data in this cluster to see if it would solve the performance problem - If it works than upgrade back the 3rd node again
  2. Make a snapshot of the current data and set up a separate cluster of 2.3.4 nodes and try to restore the data there and see if the cluster is functional (shard allocation and performance wise)

This is the cluster health:
{
"cluster_name" : "reco-cluster",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 2,
"active_shards" : 4,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 2,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 66.66666666666666
}

Shards:
reco-performers-1470150493712 0 p STARTED 798742 553.3mb 80.77.123.27 sonrisa2
reco-performers-1470150493712 0 r STARTED 798742 553.3mb 80.77.123.28 sonrisa3
reco-performers-1470150493712 0 r UNASSIGNED
reco-performers-1470154237378 0 p STARTED 807498 854.4mb 80.77.123.27 sonrisa2
reco-performers-1470154237378 0 r STARTED 807498 854.2mb 80.77.123.28 sonrisa3
reco-performers-1470154237378 0 r UNASSIGNED

Any other suggestion would be appreciated!
Thanks
Szabolcs


(system) #2