Query intermittent performance issue

Paul6 · April 30, 2024, 9:49am

Hello,

I’m on Elastic Cloud 8.12.

I’m facing an issue where the same query can take from 500ms to more than 30s.

We are working with rather simple data stored in data streams but we also reproduced this behavior with classic Elasticsearch indices.

I have profile the query and successfully reproduced similar issues with very simple queries such as a term filter or a range filter: { "query": {"bool": {"filter": [ { "range": {"_expirationDate": { "gte": "2024-04-10T14:11:40Z" } } } ]} }, "size": 501}.
For instance, about 12s spent just for this range query on a single index (all spent in the match section of the profiler).

If I run the same query directly after, the performance will be much better thanks to the cache.

I also investigated Elasticsearch metrics in Kibana and CPU is always below 30%, JVM Head memory is fine, but there are spikes in the read I/O that might explain the issue:

ioRead

I suspect a configuration issue, maybe on the indexes shards or the lack of OS cache memory but I’m fairly new to ES and not sure on how to continue my investigations.

Thanks,
Paul

dadoonet · April 30, 2024, 10:08am

Bonjour Paul

What is the output of:

GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v

What is the configuration you chose for the cloud instance? What kind of "hardware profile" and how much memory?

Note that I suspect as well that if you remove:

"size": 501

from the request, that could be faster? Could you check that?

Paul6 · April 30, 2024, 11:22am

Hello,

Thanks for the quick response, here are the info you asked, don’t hesitate if you need anything else.

Get /_cat/nodes?v (sorry I wasn't able to copy as a table nor as a correct image..). We can see that the ram.percent is super high but i'm not sure if it is really an issue after reading about that.

GET /_cat/health?v

status	node.total	node.data	shards	pri	relo	init	unassign	pending_tasks	max_task_wait_time	active_shards_percent
green	6	8	254	127	0	0	0	0	-	100%

GET /_cat/indices?v

I have 91 indices but here is a representative example (the issue is present for queries on both type of index, video metadata and motion events (data stream))

health	status	index	pri	rep	docs.count	docs.deleted	store.size	pri.store.size / dataset.size
Green	open	Index-type-1	1	1	193079552	0	60gb	30gb
Green	open	Index-type-2	1	1	9313735	60344	2.5gb	1.2gb

General information

Hardware profile: General purpose (was using Storage Optimized before and had the same issue)
Global Memory (it’s a test system so we currently don’t have much warm/cold)
I just tested without the Size parameter and the results are probably a bit better but overall similar (it’s complicated to be sure as the query time is very unpredictable but I still had a index taking 7s to answer the time range query).

Thanks,
Paul

Topic		Replies	Views
Elasticsearch performance issue (possibly too large filter cache) Elasticsearch	3	407	July 6, 2017
Query performance issue - need help to investigate Elasticsearch	9	2200	July 5, 2017
Performance issue in my elastic search cluster Elasticsearch	8	491	September 26, 2019
Elasticsearch can't handle multiple requests without dramatically decrease its performance Elasticsearch	2	1738	February 16, 2018
Performance problems Elasticsearch	12	589	July 6, 2017

Query intermittent performance issue

Related topics