Closing indices by itself

Silas_Hansen · July 12, 2018, 9:41pm

Hi

Our cluster has begun going into RED state every few minutes.
The log doesn't tell anything about why it's going to RED, but judging from the below, it seems that it's closing and opening indices by itself, for some unknown reason.

It should be noted that we're not at all under memory or CPU pressure, but this cluster does have 2029 indices with a total of 8100 shards (2p+1r), split across 5 nodes.

[2018-07-12T21:18:03,978][INFO ][o.e.c.m.MetaDataIndexStateService] [Prod-04] closing indices [[[index1/rXe9XZwkQ2iXIUrf5FJLzQ]]]
[2018-07-12T21:18:04,609][INFO ][o.e.c.m.MetaDataIndexStateService] [Prod-04] closing indices [[[index2/81rY1m69RMWW6FIGsPDrug]]]
[2018-07-12T21:18:06,436][INFO ][o.e.c.m.MetaDataIndexStateService] [Prod-04] opening indices [[[index1/rXe9XZwkQ2iXIUrf5FJLzQ]]]
[2018-07-12T21:18:07,109][INFO ][o.e.c.m.MetaDataIndexStateService] [Prod-04] opening indices [[[index2/81rY1m69RMWW6FIGsPDrug]]]
[2018-07-12T21:18:09,967][INFO ][o.e.c.r.a.AllocationService] [Prod-04] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[index2][0], [index2][1]] ...]).
[2018-07-12T21:18:10,046][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][0] received shard failed for shard id [[index2][0]], allocation id [9_YF-DRLSWq4Lx-FpX3shg], primary term [3], message [mark copy as stale]
[2018-07-12T21:18:10,118][INFO ][o.e.c.m.MetaDataIndexStateService] [Prod-04] closing indices [[[index3/RbneN50ZTSC762qnkY_j4A]]]
[2018-07-12T21:18:11,072][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][1] received shard failed for shard id [[index2][1]], allocation id [-9PO9zR2RaK2P1QnIsHtmw], primary term [4], message [mark copy as stale]
[2018-07-12T21:18:11,191][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][0] received shard failed for shard id [[index2][0]], allocation id [9_YF-DRLSWq4Lx-FpX3shg], primary term [3], message [mark copy as stale]
[2018-07-12T21:18:11,259][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][1] received shard failed for shard id [[index2][1]], allocation id [-9PO9zR2RaK2P1QnIsHtmw], primary term [4], message [mark copy as stale]
[2018-07-12T21:18:11,568][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][1] received shard failed for shard id [[index2][1]], allocation id [-9PO9zR2RaK2P1QnIsHtmw], primary term [4], message [mark copy as stale]
[2018-07-12T21:18:11,580][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][1] received shard failed for shard id [[index2][1]], allocation id [-9PO9zR2RaK2P1QnIsHtmw], primary term [4], message [mark copy as stale]
[2018-07-12T21:18:11,717][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][1] received shard failed for shard id [[index2][1]], allocation id [-9PO9zR2RaK2P1QnIsHtmw], primary term [4], message [mark copy as stale]
[2018-07-12T21:18:12,028][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][1] received shard failed for shard id [[index2][1]], allocation id [-9PO9zR2RaK2P1QnIsHtmw], primary term [4], message [mark copy as stale]
[2018-07-12T21:18:12,094][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][1] received shard failed for shard id [[index2][1]], allocation id [-9PO9zR2RaK2P1QnIsHtmw], primary term [4], message [mark copy as stale]
[2018-07-12T21:18:12,107][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][1] received shard failed for shard id [[index2][1]], allocation id [-9PO9zR2RaK2P1QnIsHtmw], primary term [4], message [mark copy as stale]
[2018-07-12T21:18:12,232][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][1] received shard failed for shard id [[index2][1]], allocation id [-9PO9zR2RaK2P1QnIsHtmw], primary term [4], message [mark copy as stale]
[2018-07-12T21:18:12,271][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][1] received shard failed for shard id [[index2][1]], allocation id [-9PO9zR2RaK2P1QnIsHtmw], primary term [4], message [mark copy as stale]
[2018-07-12T21:18:12,337][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][1] received shard failed for shard id [[index2][1]], allocation id [-9PO9zR2RaK2P1QnIsHtmw], primary term [4], message [mark copy as stale]
[2018-07-12T21:18:12,533][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][1] received shard failed for shard id [[index2][1]], allocation id [-9PO9zR2RaK2P1QnIsHtmw], primary term [4], message [mark copy as stale]
[2018-07-12T21:18:12,581][WARN ][o.e.c.a.s.ShardStateAction] [Prod-04] [index2][1] received shard failed for shard id [[index2][1]], allocation id [-9PO9zR2RaK2P1QnIsHtmw], primary term [4], message [mark copy as stale]
[2018-07-12T21:18:12,651][INFO ][o.e.c.r.a.AllocationService] [Prod-04] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index2][1]] ...]).
[2018-07-12T21:18:12,929][INFO ][o.e.c.m.MetaDataIndexStateService] [Prod-04] opening indices [[[index3/RbneN50ZTSC762qnkY_j4A]]]
[2018-07-12T21:18:14,743][INFO ][o.e.c.r.a.AllocationService] [Prod-04] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[index3][1], [index3][0]] ...]).
[2018-07-12T21:18:15,977][INFO ][o.e.c.r.a.AllocationService] [Prod-04] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index3][0], [index3][1]] ...]).
[2018-07-12T21:19:00,676][INFO ][o.e.c.m.MetaDataIndexStateService] [Prod-04] closing indices [[[index4/I_GucK5USTq14yLAjeuGTQ]]]
[2018-07-12T21:19:02,120][INFO ][o.e.c.m.MetaDataIndexStateService] [Prod-04] opening indices [[[index4/I_GucK5USTq14yLAjeuGTQ]]]
[2018-07-12T21:19:04,762][INFO ][o.e.c.r.a.AllocationService] [Prod-04] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[index4][1]] ...]).
[2018-07-12T21:19:06,953][INFO ][o.e.c.r.a.AllocationService] [Prod-04] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index4][1]] ...]).
[2018-07-12T21:21:00,871][INFO ][o.e.c.m.MetaDataIndexStateService] [Prod-04] closing indices [[[index5/yM15ExujQq-69pEqGdDTtQ]]]
[2018-07-12T21:21:02,517][INFO ][o.e.c.m.MetaDataIndexStateService] [Prod-04] opening indices [[[index5/yM15ExujQq-69pEqGdDTtQ]]]
[2018-07-12T21:21:04,503][INFO ][o.e.c.r.a.AllocationService] [Prod-04] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[index5][1], [index5][0]] ...]).
[2018-07-12T21:21:06,431][INFO ][o.e.c.r.a.AllocationService] [Prod-04] Cluster health status changed from [YELLOW] to [GREEN] (reason: [shards started [[index5][0]] ...]).

Any suggestions as to what is causing this?

warkolm · July 12, 2018, 10:12pm

Indices don't just open and close themselves, so something must be calling the API to do this.
You may want to enable debug logging to see if you can catch things, or enabled X-Pack Security to stop and track things.

That's overloaded, you should look at reducing this ASAP.

Silas_Hansen · July 14, 2018, 8:37am

Hi, warkolm

You made us go back and look at the code another time and it does actually seem there is a common code-path leading to opening and closing of indices, so thank you for pointing out that ES cannot do that by itself.

So to follow up on your suggestion of bringing down the shard cound:
As we're running a multi-tenancy setup with an index for each tenant, we need the relevancy for each tenant to be as high as possible, which, IIRC, is closely related to the term stats for each index. If I begin merging more tenants into the same indices, the result would be that they would also share term statistics, which would make algorithms s like TF/IDF give less relevant results.

Any way to solve this?

warkolm · July 14, 2018, 8:38am

Can you reduce your primary count?

Silas_Hansen · July 14, 2018, 8:47am

Good suggestion.

I could, and I probably will. It's right now at 2p+1r. I could bring that down to 1p+1r, even though query performance may suffer a little from reduced parallel querying. I'd have to test how much actual impact that has.

You don't know of a way to solve the term stats per. tenant issue, do you?
I guess, even though I'm bringing down the number of primaries, then I'd be running into the issue again down the road as the system scales.

Is the solution then to split into multiple separate clusters? I guess that would bring down the size of the cluster state, but I'm not sure if the size of that is actually an issue at all.

warkolm · July 14, 2018, 8:54am

You could use routing, that way all the user's docs would be in the same shard for scoring. But I think that may create more issues as you'd potentially need an index with a lot of shards.
Or add more nodes, ideally aim for <600 shards per node.

Ultimately if you aren't suffering resourcing issues then it may be a moot point for your own benefit/comfort, but that per node count is higher than we recommend.

system · August 11, 2018, 8:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Cluster State RED after opening a closed index Elasticsearch	7	2581	March 28, 2018
Elasticseach index problem - closing randomly Elasticsearch	6	702	September 11, 2018
Cluster hangs when closing index and shows strange behavior Elasticsearch	5	402	July 6, 2017
Cluster taking a long time to open closed indices Elasticsearch	6	1880	March 27, 2019
Closed indices do not release shards Elasticsearch	5	1878	March 15, 2022

Closing indices by itself

Related topics