Primary shard size higher than replicas

Susheem_Koul · November 10, 2020, 6:48am

Hi,

We've been observing some strange behaviour on our ES Cluster for the last few weeks with the primary for a shard having a much higher size than the replica.

Here's the output for shard details API. Notice shard 2 in particular where the primary is 32.5 Gb while the replica is ~19 Gb. Also, I noticed that the Global Checkpoint and the Local Checkpoint seem off for the 2nd shard.

gist.github.com

https://gist.github.com/SusheemKoul/3cc26f8c3b1d375cad160c9d266753ed

shard_details.txt

shard prirep state        docs  store flushTotal refreshTotal maxSeqNo globalCheckpoint localCheckpoint
0     r      STARTED 201266094 18.2gb         46      1726125 30827602         30827602        30827602
0     p      STARTED 201266094 18.2gb          1      1696595 30827603         30827603        30827603
1     p      STARTED 201299293 18.3gb         39      2759554 30671294         30671291        30671294
1     r      STARTED 201299293 18.3gb         44      1691646 30671295         30671295        30671295
2     r      STARTED 201475384 18.9gb         13      2090178 30639704         19281357        19281357
2     p      STARTED 201475415 32.5gb         39      2742598 30639708         19281357        30639708
3     r      STARTED 201421104 18.3gb         43      1726353 30700378         30700378        30700378
3     p      STARTED 201421104 18.2gb         40      2722519 30700378         30700378        30700378
4     p      STARTED 201350974 18.3gb         39      2740837 30682740         30682740        30682740

This file has been truncated. show original

I also checked the Index stats at shard level and could find the following information for shard 2. I noticed that the translog size is around 22 Gb for both the primary and replica which seems too high when compared to the other shards which are more or less in MBs. Here's the full data for shard level stats.

gist.github.com

https://gist.github.com/SusheemKoul/75854d944e8390269ed6f05a7d5546c6

Shard_2_Details.txt

    [
              {
                "routing": {
                  "state": "STARTED",
                  "primary": false,
                  "node": "ICf4Kp-1SdyiP9DhfS2k7A",
                  "relocating_node": null
                },
                "docs": {
                  "count": 201475969,

This file has been truncated. show original

There are a couple of instances of RecoveryFailedException on the node which holds the replica of shard 2. Also there is a spike in JVM Memory pressure just before the problem started.

Here are some of the error logs that I found corresponding to the time at which the problem started.

gist.github.com

https://gist.github.com/SusheemKoul/de295ce1013443ae958336828f8cef36

ES_ICf4Kp_RecoveryFailedErrorLogs.txt

[2020-10-12T08:01:11,543][WARN ][o.e.i.c.IndicesClusterStateService] [[2020-10-12T08:01:11,543][WARN ][o.e.i.c.IndicesClusterStateService] [ICf4Kp-] [[bookings_v9][7]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [bookings_v9][7]: Recovery failed from {t_YW814}{t_YW814FR4m_8rAYgweI9Q}{8pklOy4kQBmdD2ABKWQvgA}{__IP__}{__IP__}{__AMAZON_INTERNAL__, __AMAZON_INTERNAL__, distributed_snapshot_deletion_enabled=true} into {ICf4Kp-}{ICf4Kp-1SdyiP9DhfS2k7A}{zVvn67UISGSaNQNebrgtKg}{__IP__}{__IP__}{distributed_snapshot_deletion_enabled=true, __AMAZON_INTERNAL__, __AMAZON_INTERNAL__}
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.doRecovery(PeerRecoveryTargetService.java:282) [elasticsearch-6.4.2.jar:6.4.2]
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService.access$900(PeerRecoveryTargetService.java:80) [elasticsearch-6.4.2.jar:6.4.2]
	at org.elasticsearch.indices.recovery.PeerRecoveryTargetService$RecoveryRunner.doRun(PeerRecoveryTargetService.java:623) [elasticsearch-6.4.2.jar:6.4.2]
	at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:732) [elasticsearch-6.4.2.jar:6.4.2]
	at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.4.2.jar:6.4.2]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_252]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_252]
	at java.lang.Thread.run(Thread.java:749) [?:1.8.0_252]

This file has been truncated. show original

Do let me know if anymore information is needed.
Any help would be greatly appreciated!

Cheers,
Susheem

system · December 8, 2020, 6:48am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shard size is different between primary and replica Elasticsearch	3	1612	October 26, 2018
Difference in shard size of primary and replica Elasticsearch	3	738	May 23, 2019
Replica shards bigger than primary on some indexes Elasticsearch	3	1332	July 5, 2017
Different primary and replica shard size Elasticsearch	8	2256	July 29, 2019
Primary Shards MUCH larger than Replicas Elasticsearch	1	424	August 30, 2018

Primary shard size higher than replicas

Related topics