Validation Failed: 1: this action would add [8] total shards, but this cluster currently has [3997]/[4000] maximum shards open

Hello, Elastic!

I'm facing the trouble while indexing data. I run both ES 8.11 and OpenSearch 2.11 but both have same issues. Please help me.

I found out my shards had reached the maximum(1000 shards per nodes).
My data is about daily reports and there are 5 types of them which means it creates 5 indices per day. while there are 4 nodes including primary node, and I set 2 shards 3 replicas for each indices, it quickly reached 4000 (total max).

when I searched this problem here, I've noticed there are 2 options.

  1. To increase the maximum shards per node
  2. To use shirink API

However I'm not sure which one is suitable for me.

indices API returns:

index                                  pri rep docs.deleted store.size pri.store.size
...
type_a_20230914    2   3            0    127.6mb         31.9mb
type_a_20230915    2   3            0    122.2mb         30.5mb
type_a_20230916    2   3            0    110.6mb         27.6mb
type_a_20230917    2   3            0    107.5mb         26.8mb
type_a_20230918    2   3            0    123.9mb         30.9mb
...
type_b_20230810    2   3            0     91.2mb         22.8mb
type_b_20230811    2   3            0     90.3mb         22.5mb
type_b_20230812    2   3            0     76.8mb         19.2mb
type_b_20230813    2   3            0     76.2mb           19mb
type_b_20230814    2   3            0     91.9mb         22.9mb
...
type_c_20231106   2   3            0      1.1gb        289.6mb
type_c_20231107   2   3            0      1.1gb        287.1mb
type_c_20231108   2   3            0      1.1gb        283.4mb
type_c_20231109   2   3            0        1gb        276.9mb
type_c_20231110   2   3            0        1gb        275.8mb
type_c_20231111   2   3            0        1gb        256.3mb
type_c_20231112   2   3            0        1gb        271.6mb
...
type_e_20230809         2   3            0      2.9gb        759.4mb
type_e_20230810         2   3            0      3.1gb        814.7mb
type_e_20230811         2   3            0      3.1gb        798.1mb
type_e_20230812         2   3            0      2.8gb        739.5mb
type_e_20230813         2   3            0        3gb        777.2mb
type_e_20230814         2   3            0      2.2gb          582mb
type_e_20230815         2   3            0        4gb            1gb
type_e_20230816         2   3            0        4gb            1gb
type_e_20230817         2   3            0      4.1gb            1gb
...
type_d_20231101   2   3            0      1.2gb        315.8mb
type_d_20231102   2   3            0      1.3gb        341.7mb
type_d_20231103   2   3            0      1.2gb        322.4mb
type_d_20231104   2   3            0      1.2gb        312.1mb
type_d_20231105   2   3            0      1.4gb        366.7mb
type_d_20231106   2   3            0      1.4gb        379.5mb
type_d_20231107   2   3            0      1.5gb        378.1mb
type_d_20231108   2   3            0      1.4gb        388.4mb
...

shards API returns:

index                                     shard prirep state      docs   store ip            node
type_a_20231019     0     r      STARTED   18599  17.4mb x.x.x.x  4e03dad3d2d11852693c15d373d52c19
type_a_20231019     0     r      STARTED   18599  17.4mb x.x.x.x f5143002590a21ac644a3b4a4ed3046c
type_a_20231019     0     p      STARTED   18599  17.4mb x.x.x.x 083d4cdd1deac015a8a7cf32926a9251
type_a_20231019     0     r      STARTED   18599  17.4mb x.x.x.x  36183f90dba3d5332e92b14edf00e0a1
...
type_c_20231125    0     r      STARTED  179252 133.3mb x.x.x.x f5143002590a21ac644a3b4a4ed3046c
type_c_20231125    0     r      STARTED  179252 132.2mb x.x.x.x 083d4cdd1deac015a8a7cf32926a9251
type_c_20231125    0     r      STARTED  179252 132.5mb x.x.x.x  36183f90dba3d5332e92b14edf00e0a1
type_c_20231125    1     r      STARTED  178699 132.2mb x.x.x.x  4e03dad3d2d11852693c15d373d52c19
type_c_20231125    1     r      STARTED  178699 132.4mb x.x.x.x f5143002590a21ac644a3b4a4ed3046c
type_c_20231125    1     p      STARTED  178699 130.9mb x.x.x.x 083d4cdd1deac015a8a7cf32926a9251
...
type_e_20230812          0     r      STARTED  532017 368.9mb x.x.x.x  4e03dad3d2d11852693c15d373d52c19
type_e_20230812          0     r      STARTED  532017 368.9mb x.x.x.x f5143002590a21ac644a3b4a4ed3046c
type_e_20230812          0     p      STARTED  532017 368.9mb x.x.x.x 083d4cdd1deac015a8a7cf32926a9251
type_e_20230812          0     r      STARTED  532017 368.9mb x.x.x.x  36183f90dba3d5332e92b14edf00e0a1
...
type_b_20230909     0     r      STARTED   34636  11.6mb x.x.x.x  36183f90dba3d5332e92b14edf00e0a1
type_b_20230909     1     p      STARTED   34849  10.7mb x.x.x.x  4e03dad3d2d11852693c15d373d52c19
type_b_20230909     1     r      STARTED   34849  10.7mb x.x.x.x f5143002590a21ac644a3b4a4ed3046c
type_b_20230909     1     r      STARTED   34849  10.7mb x.x.x.x 083d4cdd1deac015a8a7cf32926a9251
type_b_20230909     1     r      STARTED   34849  10.7mb x.x.x.x  36183f90dba3d5332e92b14edf00e0a1
type_b_20230906     0     r      STARTED   41630  14.4mb x.x.x.x  4e03dad3d2d11852693c15d373d52c19
...
type_d_20231130    0     r      STARTED  191010 166.5mb x.x.x.x  4e03dad3d2d11852693c15d373d52c19
type_d_20231130    0     p      STARTED  191010 169.2mb x.x.x.x f5143002590a21ac644a3b4a4ed3046c
type_d_20231130    0     r      STARTED  191010 170.8mb x.x.x.x 083d4cdd1deac015a8a7cf32926a9251
type_d_20231130    0     r      STARTED  191010 168.2mb x.x.x.x  36183f90dba3d5332e92b14edf00e0a1
type_d_20231130    1     r      STARTED  191136 164.6mb x.x.x.x  4e03dad3d2d11852693c15d373d52c19
...

stats API returns:

GET /_cluster/stats
...
"store": {
      "size": "682.2gb",
      "size_in_bytes": 732595542916,
      "reserved": "0b",
      "reserved_in_bytes": 0
    }
...

in this case, which way is better?

Also I've been thinking that making old indices (no need to insert data) read-only and shrink, is it possbile?

Thanks in advance!

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

You have lots of very small shards. I would recommend the following:

  • Change the number of primary shards to 1 and set number of replicas to 1 as well. That will reduce the shard count immediately and make it grow slower in the future.
  • If you have a reasonably long retention period I would recommend to instead use monthly indices, at least for the small indices. The largest index seem to have 1GB of primary shards per day. If this was a monthly index with a single primary shard it would be around 30GB, which is very reasonable.

@Christian_Dahlqvist

Thank you for your suggestion!

In fact, daily reports are commoly re-indexed caused by wrong data. That is the reason why I designed as daily indices. (wheneveer I get a notification that the data went wrong, I need to delete whole daily indices and re-index them.

But I guess making my indices to 1 replica 1 shard sounds resonable. while changing index settings with PUT settings API, does it affect to search at the same time?

You shards are very small and searching lots of very small shards is not necessarily faster that searching fewer larger shards. Search performance might even improve by having fewer larger shards as you still have quite a lot of shards.

My largest index is sized over 4GB in fact (type e for now, but all typed indices size will be actually increased in the near future). So combining indices by month took quite long to remove specific dated data using delete_by_query and reindex I guess.

I barely know about the infra stuff... so I just followed sample setting(shards:2, replica:3) at the beginning few years ago. I'm wondering setting shards and replica as 1 is still stable and quick?

If so, I want to change settings at least.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.