Slow searches after changing daily to weekly indexes

Currently the configuration is set to daily indices in a single-node, so we decided to move to weekly indices in order to reduce the number of shards in the cluster.
Most of the time the client wants a 6 month history, which means that we always have more than 1000 shards created, in some cases it reaches 4k or 5k, and as we can see below, many of the indices are much smaller than elasticsearch advises.

Facing issue:
After re-indexing the daily to weekly indexes, searches with sort were considerably slower, which goes against everything I've been reading, I thought that having fewer indexes would make searches more efficient since fewer shards would be consulted.

Test:

Daily indexes:

green  open   index-2023-01-01 yBiABwMPShuiQ03LeQ7DEg   1   0     132304            0     30.6mb         30.6mb
green  open   index-2023-01-02 c_kB9wioRP29fuM-_85a9g   1   0     175048            0     40.5mb         40.5mb
green  open   index-2023-01-03 -b5KDGthSuqnPBPH5tOnLA   1   0     184778            0     41.9mb         41.9mb
green  open   index-2023-01-04 MmMnmv1_QSu5R8Uha7TjSg   1   0      86324            0     18.3mb         18.3mb
.....
green  open   index-2023-01-31 mxWh8y4XS-amRQ-uHpDyNQ   1   0     240864            0     59.9mb         59.9mb


time curl -X GET "10.10.10.10:9200/index-2023-01-*/_search" -H 'Content-Type: application/json' -d '{"size": 10000, "track_total_hits": false, "query": {"bool": { "must": [ { "range": { "index.date": { "from": 1674208736000, "to": null, "include_lower": true, "include_upper": true, "boost": 1.0 } } } ], "adjust_pure_negative": true, "boost": 1.0 } }, "sort": [ { "index.date": { "order": "desc" } } ] }' | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.5M  100 13.5M  100   308  14.9M    340 --:--:-- --:--:-- --:--:-- 14.9M
0

real    0m0.917s
user    0m0.012s
sys     0m0.022s

Weekly indexes:

green  open   index-test-2022-12-25 AWbqQg_eR_iOxtRQXXoRlw   1   0        770            0    144.9kb        144.9kb
green  open   index-test-2023-01-01 gKKuyq0eR1OfMAsLERGe7A   1   0     924199            0      206mb          206mb
green  open   index-test-2023-01-08 CEx-SpAlRQ6Lc84Fsqz07A   1   0     804620            0    167.9mb        167.9mb
green  open   index-test-2023-01-15 7Y3ctKjvTCaPYSnozPsImg   1   0    1137348            0    240.9mb        240.9mb
green  open   index-test-2023-01-22 GYLau7xsS3eCaVFv6vdPEw   1   0    1214504            0    274.8mb        274.8mb
green  open   index-test-2023-01-29 TxIrCV89QzKNv6nk_M9nBw   1   0     415912            0     99.4mb         99.4mb

time curl -X GET "10.10.10.10:9200/index-test-*/_search" -H 'Content-Type: application/json' -d '{"size": 10000, "track_total_hits": false, "query": {"bool": { "must": [ { "range": { "index.date": { "from": 1674208736000, "to": null, "include_lower": true, "include_upper": true, "boost": 1.0 } } } ], "adjust_pure_negative": true, "boost": 1.0 } }, "sort": [ { "index.date": { "order": "desc" } } ] }' | wc -l
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 13.6M  100 13.6M  100   308  10.2M    232  0:00:01  0:00:01 --:--:-- 10.2M
0

real    0m1.339s
user    0m0.006s
sys     0m0.038s

Does anyone have a tip? Could it be that I can only see improvements in a highly consulted scenario?

I thought that having fewer indexes would make searches more efficient since fewer shards

This is only somewhat true. Few indices is generally recommended for cluster stability, as every index has overhead, therefore reducing the number of indices, reduces the amount of overhead on the cluster.

However, indices (via the number of shards) allow your data to be searched in parallel, so in theory the more indices you have (the more shards you have) the more data that can be searched in parallel.

  • Note: This is only true to an extent, generally if you have too many indices, you won't have enough threads to search them all at the same time, so the searches will get queued.

For tuning for search speed, this page covers a number of different things you can look at for improving search speed.

You can also use something like the Profile API to see where your searches are spending their time to see what can be improved.

Also, looking at the example you provided, your weekly index seems to still be a bit small for the recommended shard size of 10G-50G. If your test weekly index is similar to your production indices, you might want to consider testing monthly index rotation.

1 Like

Thanks for the quick reply Ben

In production our indexes are a little bigger, around 400mb per day (they are small anyway, i know :slight_smile: )

I already tried with monthly index rotation, and searches are a little slower compared to weekly indexes, so based on the tests I'm doing here, the higher my indexes are, the slower the searches will be (searches are slower only when using sort, which in our case is always).

I ended up choosing weekly indexes to have more threads working in parallel (as you mentioned), I wasn't expecting to get huge improvements in response times, but I wasn't expecting it to get worse either

Aim for shard sizes between 10GB and 50GB:
In contrast, small shards carry proportionally more overhead and are less efficient to search. Searching fifty 1GB shards will take substantially more resources than searching a single 50GB shard containing the same data.

Is this statement still entirely true if there are only those 50GB of data in the cluster?

Could it be that I'm not getting the most correct values ​​because I'm using very small indexes/shards during these tests?

Thanks again!

I would definitely be using the profile approach, not curl with time.

Also how impactful is this 500ms difference?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.