Cluster holds an odd number of shards

Hi

Wondering about the number of shards in our v.8.19.9 cluster been odd like this when all shards ought to have exactly 1 replica:

$ eshealth
{
  "cluster_name" : "pjp-es-epj",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 15,
  "number_of_data_nodes" : 8,
  "active_primary_shards" : 457,
  "active_shards" : 915,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "unassigned_primary_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

Would have expected active shards = 914, as every index seems to have as expected exactly 1 replica only, ie this returns no matches:

$ esapi -g '_cat/indices?h=index,rep' | grep -v ' 1$'
$ esapi -g '_cat/indices?h=index,rep' | grep -c ' 1$'
284

Wondering where that extra shard hides, hints appreciated, TIA.

Our monitoring cluster seems to show long-living Shards Activity now and then, like the activity never finishes, eg:

Only this shard seems fine ImHO:

Also wondering why hot/warm data nodes' shard counts doesn't seem perfectly balanced:

Have you used the _cat/shards endpoint? Some internal indices may have auto-expand replicas so I would not expect evertying to have exactly 1 replica.

Also, you seem to be using shrink on your lifecycle, so this number may be related to a shrink task running.

Show where? Is this from AutoOps? I've had some false positives with AutoOps.

On 8.6 the heuristic was changed to consider other things like disk space and also write load depending on the license, so I'm not sure if we should expect it to be equal anymore, even though this is what is mentioned on the documentation.

On my cluster I also do not have an equal number of shards, but they are pretty close.

Ah found a default APM index: .apm-source-map with 1 primary on node769 and 2 replicas on node765+node770 :slight_smile:

$ esapi -g '_cat/shards?h=index,shard,prirep,state,node' | sort | awk '/ r /{printf "  -  %s",$0} / p /{printf "\n%s",$0}'

.apm-agent-configuration                                           0 p STARTED node771  -  .apm-agent-configuration                                           0 r STARTED node769
.apm-custom-link                                                   0 p STARTED node769  -  .apm-custom-link                                                   0 r STARTED node771
.apm-source-map                                                    0 p STARTED node769  -  .apm-source-map                                                    0 r STARTED node765  -  .apm-source-map                                                    0 r STARTED node770

Right, are using DC allocation routing, as we've got nodes spread across 3x DCs.

Thanks.