I’m facing an issue where the same query can take from 500ms to more than 30s.
We are working with rather simple data stored in data streams but we also reproduced this behavior with classic Elasticsearch indices.
I have profile the query and successfully reproduced similar issues with very simple queries such as a term filter or a range filter: { "query": {"bool": {"filter": [ { "range": {"_expirationDate": { "gte": "2024-04-10T14:11:40Z" } } } ]} }, "size": 501}.
For instance, about 12s spent just for this range query on a single index (all spent in the match section of the profiler).
If I run the same query directly after, the performance will be much better thanks to the cache.
I also investigated Elasticsearch metrics in Kibana and CPU is always below 30%, JVM Head memory is fine, but there are spikes in the read I/O that might explain the issue:
I suspect a configuration issue, maybe on the indexes shards or the lack of OS cache memory but I’m fairly new to ES and not sure on how to continue my investigations.
Thanks for the quick response, here are the info you asked, don’t hesitate if you need anything else.
Get /_cat/nodes?v (sorry I wasn't able to copy as a table nor as a correct image..). We can see that the ram.percent is super high but i'm not sure if it is really an issue after reading about that.
I have 91 indices but here is a representative example (the issue is present for queries on both type of index, video metadata and motion events (data stream))
health
status
index
pri
rep
docs.count
docs.deleted
store.size
pri.store.size / dataset.size
Green
open
Index-type-1
1
1
193079552
0
60gb
30gb
Green
open
Index-type-2
1
1
9313735
60344
2.5gb
1.2gb
General information
Hardware profile: General purpose (was using Storage Optimized before and had the same issue)
Global Memory (it’s a test system so we currently don’t have much warm/cold)
I just tested without the Size parameter and the results are probably a bit better but overall similar (it’s complicated to be sure as the query time is very unpredictable but I still had a index taking 7s to answer the time range query).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.