What does "re-syncing mappings with cluster state for types" on startup mean?


(Matt) #1

Hey all,

I have a cluster with a couple of hundred indices, each one between tens of
megs to hundreds of megs and tens of thousands to hundreds of thousands of
documents. I was doing an upgrade from 18.2 to 18.4, and it took a solid
20+ minutes to load up, with messages like the following:

re-syncing mappings with cluster state for types

for each index, which would take between a few seconds up to a minute or
two. I also get missing shard exceptions like the following, on indices
which were closed before the shutdown:

[2011-11-21 11:41:19,510][DEBUG][action.admin.indices.status] [Paragon]
[index1][0], node[smWjnoWoQza9C-ql5hpGig], [P], s[INITIALIZING]: Failed to
execute
[org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@38b4a41]
org.elasticsearch.index.IndexShardMissingException: [index1][0] missing
at
org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:177)
at
org.elasticsearch.action.admin.indices.status.TransportIndicesStatusAction.shardOperation(TransportIndicesStatusAction.java:135)
at
org.elasticsearch.action.admin.indices.status.TransportIndicesStatusAction.shardOperation(TransportIndicesStatusAction.java:58)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:232)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:210)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$1.run(TransportBroadcastOperationAction.java:186)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

Are the exceptions something I need to worry about? Can somebody explain
what it's doing during this time? Lastly, what can I do to
eliminate/minimize this time?

Cheers for any help or explanation anybody can provide!

Matt


(Shay Banon) #2

Usually, this "re-sync" should not happen, but, sometimes the order at
which mappings are serialized was wrong, and in order to fix it, the
mappings needs to "resync". There was a change like that in 0.18.3/4, if
you used root level mapping elements (like _source / _size). This will be
"heavy" the more indices / types you have, since it needs to be applied to
all of them, but, this change only happens when an unordered part of the
serialization is found, which I hope we nailed them by now.

Regarding the indices status, you can ignore it, it should be harmless.

On Mon, Nov 21, 2011 at 6:48 PM, Matt matt.chu@gmail.com wrote:

Hey all,

I have a cluster with a couple of hundred indices, each one between tens
of megs to hundreds of megs and tens of thousands to hundreds of thousands
of documents. I was doing an upgrade from 18.2 to 18.4, and it took a solid
20+ minutes to load up, with messages like the following:

re-syncing mappings with cluster state for types

for each index, which would take between a few seconds up to a minute or
two. I also get missing shard exceptions like the following, on indices
which were closed before the shutdown:

[2011-11-21 11:41:19,510][DEBUG][action.admin.indices.status] [Paragon]
[index1][0], node[smWjnoWoQza9C-ql5hpGig], [P], s[INITIALIZING]: Failed to
execute
[org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@38b4a41
]
org.elasticsearch.index.IndexShardMissingException: [index1][0] missing
at
org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:177)
at
org.elasticsearch.action.admin.indices.status.TransportIndicesStatusAction.shardOperation(TransportIndicesStatusAction.java:135)
at
org.elasticsearch.action.admin.indices.status.TransportIndicesStatusAction.shardOperation(TransportIndicesStatusAction.java:58)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:232)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction.performOperation(TransportBroadcastOperationAction.java:210)
at
org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$AsyncBroadcastAction$1.run(TransportBroadcastOperationAction.java:186)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

Are the exceptions something I need to worry about? Can somebody explain
what it's doing during this time? Lastly, what can I do to
eliminate/minimize this time?

Cheers for any help or explanation anybody can provide!

Matt


(Matt) #3

I see. Is this a fix in master, that will go out as part of the next point
release (0.18.5)? Thanks for the explanation!


(Shay Banon) #4

No, its not a fix, I explained the change in 0.18.3, why mappings might
need to be "re-sync'ed", which is a one time cost when upgrading to 0.18
(and only applicable in certain situations, depending on your mappings).

On Tue, Nov 22, 2011 at 5:44 AM, Matt matt.chu@gmail.com wrote:

I see. Is this a fix in master, that will go out as part of the next point
release (0.18.5)? Thanks for the explanation!


(Matt) #5

Ok, ok, I think I got it: because I jumped from 0.17.8 to 0.18.2 then to
0.18.4, I saw this happen twice. Now, if I understand correctly, subsequent
restarts should no longer have to go through this. Is that about right?


(Shay Banon) #6

Yea, you got it. The way a node checks if mapping have changed is using a
binary check of the serialized format of the mappings (regenerated by ES,
to make sure its ordered and consistent).

On Tue, Nov 22, 2011 at 5:04 PM, Matt matt.chu@gmail.com wrote:

Ok, ok, I think I got it: because I jumped from 0.17.8 to 0.18.2 then to
0.18.4, I saw this happen twice. Now, if I understand correctly, subsequent
restarts should no longer have to go through this. Is that about right?


(system) #7