Error encountered during 1.4->1.7 upgrade


(Jeff Evans) #1

We are in the process of updating our ES cluster from 1.4.4 to 1.7.3. We have a number of different indexes and nodes, and want to remain live during the update, so we're doing a rolling restart. Upon upgrading one node, and restarting it, we are seeing a bunch of errors in the logs like the following. The strange part of this is, the aliases and indexes it's talking about aren't even being hosted on this node. These errors have been appearing in the node log continuously ever since it came back online with the updated version.

Before we proceed with the rest of the update, we want to understand what is happening and make sure it's safe to proceed. As of now, it appears the alias definition (for "30053280" in this case) are intact. Any suggestions?

[2015-11-25 12:14:47,214][WARN ][action.delete            ] [node_20] unexpected error during the primary phase for action [indices:data/write/delete]
org.elasticsearch.ElasticsearchIllegalArgumentException: Alias [30053280] has index routing associated with it [515], and was provided with routing value [27958558], rejecting operation
at org.elasticsearch.cluster.metadata.MetaData.resolveIndexRouting(MetaData.java:491)
at org.elasticsearch.action.delete.TransportDeleteAction.resolveRequest(TransportDeleteAction.java:105)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.checkBlocks(TransportShardReplicationOperationAction.java:396)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$PrimaryPhase.doRun(TransportShardReplicationOperationAction.java:351)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:36)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction.doExecute(TransportShardReplicationOperationAction.java:112)
at org.elasticsearch.action.delete.TransportDeleteAction.innerExecute(TransportDeleteAction.java:145)
at org.elasticsearch.action.delete.TransportDeleteAction.doExecute(TransportDeleteAction.java:94)
at org.elasticsearch.action.delete.TransportDeleteAction.doExecute(TransportDeleteAction.java:51)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:75)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$OperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:207)
at org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$OperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:189)
at org.elasticsearch.transport.netty.MessageChannelHandler.handleRequest(MessageChannelHandler.java:222)

(Jeff Evans) #2

It turns out we had some bad routing logic going on (the routing set in the request disagreed with the value set in the alias), and this error was unrelated to the upgrade itself. My guess is the previous version simply didn't print this error for our current log configuration.


(system) #3