Two simple and quite basic questions regarding search performance in elastichsearch

smm · April 2, 2025, 5:48am

Dear community I have two simple and quite basic questions regarding search performance in elastichsearch in general:
If I am interested of doing searches in the documents / indicies in a time window of max. the latest 24 hours, does it have an impact on search performance if I have also stored data on the disk older than that, that I do nothing with?
Meaning: If I only search hot data (maxx 24 hours) does it matter in terms of search performance if I have 30 days of 300GB each day staying on the disks too and this amount perhpas is still growing?

Background: I have search performance issues. If I theoretically cut the amount of data from 60 days to 24h would it make a noticable, positive impact on search performance? (under the condition that I only search the latest 24 hours)

Thank you very much for any insight in this!
Kind regards

Christian_Dahlqvist · April 2, 2025, 5:58am

Which version of Elasticsearch are you using?

Are you using time-based indices?

smm · April 2, 2025, 6:03am

I am using ES Version 8.14.3. In approx 1 month there will be an update to 8.17.x and I am running a basic licence for now in this cluster.
Yes, these are syslog documents stored in a daily data stream.
cheers

Christian_Dahlqvist · April 2, 2025, 6:29am

In newer versions Elasticsearch is quite efficient in efficiently ruling out indices that can not contain any matching data based on timestamp range, so I would not expect much difference. You can probably test this by manually running a query with a timestamp filter against all indices and then against only the indices that you expect top find matches in.

Queries against many indices may require data to be fetched from disk so if you have low I/O performance and a lot of indices the storage performance could make a difference.

smm · April 2, 2025, 8:15am

Hi Christian, thanks for your insinght! Actually, the company where I just started a few months ago is using spinning disks instead of ssd / nvme so I get a low search performance for hot data and I have to come up with a solution of how to compensate a bit this unfortunate setting.

Christian_Dahlqvist · April 2, 2025, 8:21am

Try the test I mentioned. First query just the newest indices by naming them and then run the same query against all indices irrespective of age. Make sure you are using a timestamp range that just matches the data in the first indices. The second query may hit the cache on the previously queried indices but that is fine as you want to see the potential impact of querying the rest of them. Please share here what the difference in latency is and how many indices were involved in each query.

There are not necessarily any magic solution that fixes slow storage. If you can get a couple of nodes with faster storage I would recommend a hot-warm architecture though.

RainTown · April 2, 2025, 9:05am

Be wary of spending more time/effort on working around the core issue than just fixing the core issue.

It doesn’t cost that much for decent IO performance in 2025.

smm · April 2, 2025, 11:07am

...tell it my company
In any case: I totally agree with you!

Topic		Replies	Views
Elasticsearch using SSD vs HDD comparison Elasticsearch	5	8296	April 25, 2019
ES Index performance Elasticsearch	26	999	July 6, 2017
Query Performance in ES Elasticsearch	6	322	September 8, 2020
Indexes and time to keep information Elasticsearch	5	3917	February 20, 2017
Questions from a newbie Elasticsearch	15	420	July 6, 2017

Two simple and quite basic questions regarding search performance in elastichsearch

Related topics