Reindexing of data after 1.6 -> 2.2 upgrade

ronchalant · February 22, 2016, 9:38pm

TL;DR: will data nodes be offline during reindexing after a major upgrade (1.x -> 2.x)?

Long story:
Our DEV cluster has 4 nodes - 2 pure data nodes, 1 master and 1 client-only node.

I understand that after upgrading the indexes will be rebuilt (new version of Lucene, etc.), and also that I had to setup unicast since multicast is no longer used for discovery.

The configuration changes I made are as follow:

network.host: _eth0_ discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: ["10.x.x.x"]

Where the value in the unicast.hosts property is the IP address of the single master node. The above are set the same on all four boxes.

So after upgrading all the nodes, I see the two data nodes are maxing out CPU frequently, seem active in rebuilding the index (which I understood to be normal). The issue is that when I hit the client node the only available nodes on the cluster appear to be my master & client nodes; the data nodes aren't showing as part of the cluster (looking at /_nodes?pretty)

Is this normal? Will the data nodes be offline until reindexing is complete?

warkolm · February 22, 2016, 10:45pm

It doesn't reindex, it may upgrade the underlying segments to the latest lucene version but only if they merge. I don't know how long this takes though TBH.

ronchalant · February 23, 2016, 3:00pm

I let it run overnight and while the high CPU activity appears to have settled but the logs are littered with these messages:

ElasticsearchException[failed to flush exporter bulks] at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:104) at org.elasticsearch.marvel.agent.exporter.ExportBulk.close(ExportBulk.java:53) at org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:201) at java.lang.Thread.run(Thread.java:745) Suppressed: ElasticsearchException[failed to flush [default_local] exporter bulk]; nested: ElasticsearchException[failure in bulk execution: [0]: index [.marvel-es-2016.02.22], type [node_stats], id [AVMLbBxGhr_mPmdg1NfF], message [UnavailableShardsException[[.marvel-es-2016.02.22][0] primary shard is not active Timeout: [1m], request: [shard bulk {[.marvel-es-2016.02.22][0]}]]]]; at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:106) ... 3 more Caused by: ElasticsearchException[failure in bulk execution: [0]: index [.marvel-es-2016.02.22], type [node_stats], id [AVMLbBxGhr_mPmdg1NfF], message [UnavailableShardsException[[.marvel-es-2016.02.22][0] primary shard is not active Timeout: [1m], request: [shard bulk {[.marvel-es-2016.02.22][0]}]]]] at org.elasticsearch.marvel.agent.exporter.local.LocalBulk.flush(LocalBulk.java:114) at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:101) ... 3 more

ronchalant · February 23, 2016, 3:35pm

so apparently something was messed up w/ the .marvel* and .kibana indexes. deleting those seemed to fix things.

this was a dev environment so losing these indexes isn't a big deal, but I'm curious why it would have caused an issue?

warkolm · February 23, 2016, 6:29pm

You'd have to look in the logs to see why they became unallocated.

Topic		Replies	Views
Unicast not working after upgrade to 20.2 Elasticsearch	7	397	July 6, 2017
Elasticsearch doesn't join the cluster back after restart Elasticsearch	4	1871	July 5, 2017
Elasticsearch won't rebalance (v. 2.0.1); Need to upgrade! Elasticsearch	7	764	August 23, 2018
Master node re-join cluster after network outage Elasticsearch	5	1217	July 5, 2017
Bringing Elasticsearch node back online after being offline for a week Elasticsearch	5	460	October 2, 2022

Reindexing of data after 1.6 -> 2.2 upgrade

Related topics