Reindexing of data after 1.6 -> 2.2 upgrade


(Ron) #1

TL;DR: will data nodes be offline during reindexing after a major upgrade (1.x -> 2.x)?

Long story:
Our DEV cluster has 4 nodes - 2 pure data nodes, 1 master and 1 client-only node.

I understand that after upgrading the indexes will be rebuilt (new version of Lucene, etc.), and also that I had to setup unicast since multicast is no longer used for discovery.

The configuration changes I made are as follow:

network.host: _eth0_ discovery.zen.ping.multicast.enabled: false discovery.zen.ping.unicast.hosts: ["10.x.x.x"]

Where the value in the unicast.hosts property is the IP address of the single master node. The above are set the same on all four boxes.

So after upgrading all the nodes, I see the two data nodes are maxing out CPU frequently, seem active in rebuilding the index (which I understood to be normal). The issue is that when I hit the client node the only available nodes on the cluster appear to be my master & client nodes; the data nodes aren't showing as part of the cluster (looking at /_nodes?pretty)

Is this normal? Will the data nodes be offline until reindexing is complete?


(Mark Walkom) #2

It doesn't reindex, it may upgrade the underlying segments to the latest lucene version but only if they merge. I don't know how long this takes though TBH.


(Ron) #3

I let it run overnight and while the high CPU activity appears to have settled but the logs are littered with these messages:

ElasticsearchException[failed to flush exporter bulks] at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:104) at org.elasticsearch.marvel.agent.exporter.ExportBulk.close(ExportBulk.java:53) at org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:201) at java.lang.Thread.run(Thread.java:745) Suppressed: ElasticsearchException[failed to flush [default_local] exporter bulk]; nested: ElasticsearchException[failure in bulk execution: [0]: index [.marvel-es-2016.02.22], type [node_stats], id [AVMLbBxGhr_mPmdg1NfF], message [UnavailableShardsException[[.marvel-es-2016.02.22][0] primary shard is not active Timeout: [1m], request: [shard bulk {[.marvel-es-2016.02.22][0]}]]]]; at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:106) ... 3 more Caused by: ElasticsearchException[failure in bulk execution: [0]: index [.marvel-es-2016.02.22], type [node_stats], id [AVMLbBxGhr_mPmdg1NfF], message [UnavailableShardsException[[.marvel-es-2016.02.22][0] primary shard is not active Timeout: [1m], request: [shard bulk {[.marvel-es-2016.02.22][0]}]]]] at org.elasticsearch.marvel.agent.exporter.local.LocalBulk.flush(LocalBulk.java:114) at org.elasticsearch.marvel.agent.exporter.ExportBulk$Compound.flush(ExportBulk.java:101) ... 3 more


(Ron) #4

so apparently something was messed up w/ the .marvel* and .kibana indexes. deleting those seemed to fix things.

this was a dev environment so losing these indexes isn't a big deal, but I'm curious why it would have caused an issue?


(Mark Walkom) #5

You'd have to look in the logs to see why they became unallocated.


(system) #6